WO2023056876A1

WO2023056876A1 - Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor

Info

Publication number: WO2023056876A1
Application number: PCT/CN2022/122373
Authority: WO
Inventors: 邱志威; 陈帅; 高崧
Original assignee: 寒武纪(西安)集成电路有限公司
Priority date: 2021-10-08
Filing date: 2022-09-29
Publication date: 2023-04-13
Also published as: CN115966535A; TW202316621A

Abstract

A longitudinal stacked chip, an integrated circuit device, a board, and a manufacturing method therefor. A computing device (401) is comprised in the integrated circuit device. The integrated circuit device comprises an interface device (402) and a processing device (403). The computing device (401) interacts with the processing device (403) to jointly complete a computing operation specified by a user. The integrated circuit device can further comprise an off-chip memory (404). The off-chip memory (404) is respectively connected to the computing device (401) and the processing device (403) for storing data of the computing device (401) and the processing device (403).

Description

Vertically stacked chips, integrated circuit devices, boards and manufacturing methods thereof

Cross References to Related Applications

This application claims the priority of the Chinese patent application filed on October 08, 2021 with application number 202111172917.6 and titled "Vertical Stacked Chips, Integrated Circuit Devices, Boards and Their Manufacturing Methods".

technical field

The present invention generally relates to the field of semiconductors. More specifically, the present invention relates to vertically stacked chips, integrated circuit devices, boards and manufacturing methods thereof.

Background technique

Since the advent of the era of big data, SoCs combined with artificial intelligence technology need to cope with increasingly complex environments, forcing SoCs to develop more functions, and the current chip design has approached the maximum mask size. Therefore, developers try to divide the system-on-chip into multi-chip modules, and the modules need to be connected with ultra-short and extra-short distances to achieve high-speed between dies. data transfer. In addition to extending bandwidth as much as possible, die-to-die (D2D) connection is an extremely low latency and extremely low power consumption solution.

A die-to-die interface is a functional block that occupies a small area of the die to provide a data interface between two modules or two die assembled in the same package. Die-to-die interfaces utilize very short channels to connect modules or dies within a package, with transfer rates and bandwidths that exceed traditional chip-to-chip interfaces.

In the prior art, two modules or dies connected by a die-to-die interface are usually placed side by side, and the die-to-die interfaces of the two modules or dies are adjacent, and the two die-to-die The granular interface is electrically connected through the interposer layer below. Although the transfer rate and bandwidth of the die-to-die interface are excellent, when transferring data through the underlying interposer, the transfer path is as high as millimeters. If the transmission path is too long, the signal will be attenuated and the speed will be reduced, which still cannot meet the requirements of high-intensity computing.

Therefore, a technical solution for shortening the transmission distance between crystal grains is urgently needed.

Contents of the invention

In order to at least partly solve the technical problems mentioned in the background art, the solution of the present invention provides a vertically stacked chip, an integrated circuit device, a board and a manufacturing method thereof.

In one aspect, the present invention discloses a vertically stacked chip, including a first die group and a second die group. The first die group includes a first die and a second die using a face-to-face process, the second die group includes a first die and a second die using a face-to-face process, and the first die group and the second die Groups are processed back-to-back.

In another aspect, the present invention discloses an integrated circuit device including the aforementioned vertically stacked chips; and also discloses a board including the aforementioned integrated circuit device.

In another aspect, the present invention discloses a method for vertically stacking chips. The vertically stacked chips include a first die group and a second die group. The method includes: bonding the first crystal grain and the second crystal grain in the first crystal grain group face-to-face; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain back-to-back Die Group and Second Die Group.

The present invention adopts face-to-face lamination of the grains of the same grain group, and adopts back-to-back lamination of adjacent grain groups, so that the transmission path between the grains in the same grain group is greatly shortened, which helps to improve the internal strength of the grain group. transmission efficiency.

Description of drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily understood by reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of the present invention are shown by way of illustration and not limitation, and the same or corresponding reference numerals indicate the same or corresponding parts. in:

FIG. 1 shows a top view of the layout of a package structure including a die-to-die interface;

FIG. 2 shows a cross-sectional view of the packaging structure in FIG. 1 along the dotted line direction;

Fig. 3 is a structural diagram showing a board of an embodiment of the present invention;

4 is a structural diagram illustrating an integrated circuit device according to an embodiment of the present invention;

Fig. 5 is a schematic view showing vertical stacking according to another embodiment of the present invention;

Fig. 6 is a sectional view showing the structure of Fig. 5;

Fig. 7 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;

Fig. 8 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;

Fig. 9 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;

Fig. 10 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;

Fig. 11 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 5;

Fig. 12 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 7;

Fig. 13 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 8;

Fig. 14 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 9;

Fig. 15 is a flow chart showing another embodiment of the present invention to realize back-to-back stacking;

Figure 16 is a cross-sectional view illustrating step 1501;

FIG. 17 is a cross-sectional view illustrating step 1504;

FIG. 18 is a cross-sectional view illustrating step 1505;

Figure 19 is a cross-sectional view illustrating step 1505;

Figure 20 is a cross-sectional view illustrating step 1505;

FIG. 21 is a cross-sectional view illustrating step 1505;

FIG. 22 is a cross-sectional view illustrating step 1506;

Figure 23 is a sectional view showing step 1507;

FIG. 24 is a cross-sectional view illustrating step 1508;

Figure 25 is a cross-sectional view illustrating step 1509; and

FIG. 26 is a cross-sectional view illustrating step 1511 .

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

It should be understood that the terms "first", "second", "third" and "fourth" in the claims, description and drawings of the present invention are used to distinguish different objects, rather than to describe a specific order . The terms "comprising" and "comprising" used in the description and claims of the present invention indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude one or more other features, integers , steps, operations, elements, components, and/or the presence or addition of collections thereof.

It should also be understood that the terms used in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention. As used in the specification and claims herein, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise. It should be further understood that the term "and/or" used in the description and claims of the present invention refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.

As used in this specification and claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context.

The specific implementation manner of the present invention will be described in detail below in conjunction with the accompanying drawings.

A die-to-die interface is like any other chip-to-chip interface, a data link channel established between the two dies. The die-to-die interface is logically divided into physical layer, link layer, and transaction layer, and provides a standardized parallel interface to the internal interconnect structure.

1 shows a top view of the layout of a package structure including a die-to-die interface. The layout of the package structure is located in a molding compound area 10 of a chip. The molding compound area 10 includes a system area and a storage area. An exemplary system area is located in the center of the molding compound area 10 for placing two SoCs 101 , and storage areas are respectively located on both sides of the system area for placing eight off-chip memories 102 .

The system area also has a die-to-die area 103 , a physical area 104 and an input-output area 105 . The die-to-die area 103 is formed with a transceiver circuit for data sharing between the two SoCs 101; the physical area 104 is formed with a physical access circuit for accessing the off-chip memory 102; the input-output area 105 is formed with input and output The circuit is used as an interface for external communication of the system on chip 101 .

The memory 106 is also placed in the system area as a temporary storage space of the system on chip 101 , its capacity is smaller than that of the off-chip memory 102 , but the data transfer rate is higher than that of the off-chip memory 102 .

FIG. 2 shows a cross-sectional view of the package structure in FIG. 1 along the dotted line direction. As shown in the figure, the system area is divided into upper and lower layers. The upper layer is the SoC 101 , and the lower layer is the transceiver circuit of the die-to-die area 103 , the memory 106 and the I/O circuit of the I/O area 105 . The packaging structure further includes an interposer 201 and a substrate 202 , and the interposer 201 is disposed on the substrate 202 . When two SoCs 101 perform data transmission, the path is the system on chip 101 at the sending end → the transceiver circuit of the die-to-die area 103 at the sending end → the interposer 201 → the transceiver circuit of the die-to-die area 103 at the receiving end → The system on chip 101 at the receiving end realizes the technical effect of low delay and low power consumption of the die-to-die port.

FIG. 3 shows a schematic structural diagram of a board 30 according to an embodiment of the present invention. As shown in Figure 1, the board card 30 includes a chip 301, which is a system-on-a-chip integrated with one or more combination processing devices. The combination processing device is an artificial intelligence computing unit to support various types of deep learning and Machine learning algorithms meet the intelligent processing requirements in complex scenarios in the fields of computer vision, speech, natural language processing, and data mining. In particular, deep learning technology is widely used in the field of cloud intelligence. A notable feature of cloud intelligence applications is the large amount of input data, which has high requirements for the storage capacity and computing power of the platform. The board 30 of this embodiment is suitable for cloud intelligence applications. applications, with huge off-chip storage, on-chip storage and powerful computing capabilities.

The chip 301 is connected to an external device 303 through an external interface device 302 . The external device 303 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card or a wifi interface, and the like. The data to be processed can be transmitted to the chip 301 by the external device 303 through the external interface device 302 . The calculation result of the chip 301 can be sent back to the external device 303 via the external interface device 302 . According to different application scenarios, the external interface device 302 may have different interface forms, such as a PCIe interface and the like.

In more detail, the chip 301 includes computing means and processing means. The computing device is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor, which is used to perform deep learning or machine learning calculations. As a general-purpose processing device, the processing device performs basic control including but not limited to data transfer, starting and/or stopping the computing device, and the like. Depending on the implementation, the processing means may be one or more types of processing in a central processing unit (CPU), a graphics processing unit (GPU), or other general-purpose and/or special-purpose processors. These processors include but are not limited to digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays, FPGAs) or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs. As mentioned above, only in terms of the computing device of this embodiment, it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when the integration of computing devices and processing devices is considered together, the two are considered to form a heterogeneous multi-core structure.

The board 30 also includes a storage device 304 for storing data, which includes one or more storage units 305 . The storage device 304 is connected and data transmitted with the control device 306 and the chip 301 through the bus. The control device 306 in the board 30 is configured to regulate the state of the chip 301 . To this end, in an application scenario, the control device 306 may include a microcontroller (Micro Controller Unit, MCU).

FIG. 4 shows the structure of the combined processing device in the board 30. As shown in FIG. The combined processing device 40 includes a computing device 401 , an interface device 402 , a processing device 403 and an off-chip memory 404 .

The computing device 401 is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor for performing deep learning or machine learning calculations, which can interact with the processing device 403 through the interface device 402 to Work together to complete user-specified operations.

The interface device 402 is connected to the bus for connecting with other devices, such as the control device 306 and the external interface device 302 in FIG. 3 .

As a general processing device, the processing device 403 performs basic control including but not limited to data transfer, starting and/or stopping of the computing device 401 . According to different implementations, the processing device 403 may be one or more types of processors in a central processing unit, a graphics processing unit, or other general and/or special purpose processors, these processors include but are not limited to digital signal processors , application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs. As mentioned above, only for the computing device 401 in this embodiment, it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when considering the integration of the computing device 401 and the processing device 403 together, they are considered to form a heterogeneous multi-core structure.

The off-chip memory 404 is used to store data to be processed, which is a DDR memory, usually 16G or larger in size, and is used to store data of the computing device 401 and/or the processing device 403 .

Figure 5 shows a schematic diagram of a vertical stack of embodiments of the present invention. This embodiment is a multi-core chip, including a first die group and a second die group, wherein the first die group includes a first core layer 51 and a first memory layer 52, and the second die group includes a second core Layer 53 and the second memory layer 54, actually the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 are vertically stacked together in sequence, each layer in Fig. 5 is visually The upper and lower separations are shown in this manner for convenience of illustration only.

The first core layer 51 realizes the function of the processor core, including the first operation area 511, and the first operation area 511 is full of the logic layer of the first core layer 51, that is, the top side of the first core layer 51 in the figure, the first core The layer 51 also includes a first grain-to-grain area 512 and a first through-silicon via 513 in a special area, and the first computing area 511 generates a first computing circuit to realize the function of the computing device 401; the first grain-to-crystal The die area 512 is formed with a first transceiver circuit for the die-to-die interface of the first computing circuit; the first through-silicon via 513 is used for realizing the electrical interconnection of stacked chips in a three-dimensional integrated circuit.

The first memory layer 52 implements the function of on-chip memory, including a first memory area 521 , a first input/output area 522 , a first physical area 523 and a second TSV 524 . The first memory area 521 is formed with a storage unit for temporarily storing the operation result of the first operation circuit. The first input-output area 522 is formed with a first input-output circuit, which is used as an interface for the first core layer 51 to communicate with the first memory layer 52 , that is, to realize the function of the interface device 402 . The second physical area 523 has a first physical access circuit for accessing the off-chip memory 404 . The second TSVs 524 extend over the entire first memory layer 52 , and are only shown on one side for example, and are used to electrically connect specific elements.

The second core layer 53 realizes the function of the processor core, including a second operation area 531, and the second operation area 531 is full of the logic layer of the second core layer 53, that is, the top side of the second core layer 53 in the figure. Layer 53 also includes a second grain-to-grain region 532 and a third through-silicon via 533 in a special area, and a second computing circuit is formed in the second computing region 531 to realize the function of the processing device 403; The second transceiver circuit is formed in the grain area 532 to serve as the die-to-die interface of the second computing circuit; the third TSV 533 is also used to realize the electrical interconnection of the stacked chips in the three-dimensional integrated circuit.

The second memory layer 54 implements the function of on-chip memory, including a second memory area 541 , a second input/output area 542 , a second physical area 543 and a fourth TSV 544 . The second memory area 541 is formed with a storage unit for temporarily storing the operation result of the second operation circuit. The second input-output area 542 is formed with a second input-output circuit, which is used as an interface for the second core layer 53 to communicate with the second memory layer 54 , that is, to realize the function of the interface device 402 . The second physical area 543 has a second physical access circuit for accessing the off-chip memory 404 . The fourth TSV 544 spreads over the entire second memory layer 54 , and is only shown on one side as an example, for electrically connecting specific components.

If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs. The transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit, the input-output TSV is used to electrically conduct the data of the input-output circuit, and the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.

When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 → the first transceiver circuit in the first die-to-die area 512 → the first Transceiver TSV of TSV 513 → Transceiver TSV of second TSV 524 → second transceiver circuit of second grain-to-grain region 532 → second computing circuit of second computing region 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.

When the calculation result of the computing device 401 needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 522→the second silicon I/O TSV of TSV 524 → I/O TSV of second TSV 533 → I/O TSV of fourth TSV 544; when other devices outside the chip want to transmit data to the first memory area At 521, the data arrives at the first memory area 521 through the aforementioned reverse path. When the calculation result of the processing device 403 needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the input-output circuit of the second input-output area 542→the fourth TSV 544 input and output TSVs; when other off-chip devices want to transmit data to the second memory area 541, the data arrives at the second memory area 541 through the aforementioned reverse path.

When the data in the first memory area 521 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 523 → the physical TSV of the second TSV 524 → Physical TSV of the second TSV 533 → Physical TSV of the fourth TSV 544; when the off-chip memory 404 intends to transmit input data to the first memory area 521 for processing by the computing device 401, the data passes through The aforementioned reverse path reaches the first memory area 521 . When the data in the second memory area 541 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 543 → the physical TSV of the fourth TSV 544 ; When the off-chip memory 404 intends to transmit input data to the second memory area 541 for processing by the processing device 403, the data arrives at the second memory area 541 through the aforementioned reverse path.

Layers can be divided into logical side and opposite side. The logic side is provided with logic circuits to achieve specific functions, and the opposite side is the other side of the layer where logic circuits are not laid out. FIG. 6 shows a cross-sectional view of the structure of FIG. 5 . In this embodiment, the first core layer 51 is used in conjunction with the first memory layer 52, and the second core layer 53 is used in conjunction with the second memory layer 54. For transmission efficiency, the first core layer 51 and the first memory layer 52 use Face-to-face bonding process, that is, the logical side of the first core layer 51 that generates the first computing area 511 is bonded to the logical side of the first memory layer 52 that generates the first memory area 521, so that the first computing circuit and the first memory Area 521 has the shortest transmission path. Similarly, the logical side where the second core layer 53 generates the second computing area 531 fits with the logical side where the second memory layer 54 generates the first memory area 541, so that the transmission between the second computing circuit and the second memory area 541 The shortest path. In order to realize the aforementioned shortest transmission path, the first die group and the second die group adopt a back-to-back bonding process, that is, opposite sides of the first memory layer 52 and opposite sides of the second core layer 53 are bonded.

With the arrangement shown in FIG. 6, the first grain-to-grain region 512 and the second grain-to-grain region 532 are vertically stacked so that the grain-to-grain interface of the first core layer 51 is connected to the second core layer 53. The grain-to-grain interface is directly electrically connected to the second TSV 524 through the first TSV 513 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.

In summary, the first die group in this embodiment includes the first die and the second die using the face-to-face process, and the second die group in this embodiment includes the first die and the second die using the face-to-face process. Die, while the first die group and the second die group adopt back-to-back process, in which the first die can be the processor core or memory, and the second die is the other of the processor core and memory, which are used in conjunction with each other .

In another case, the positions of the first core layer 51 and the first memory layer 52 of the first die group can be reversed, and the positions of the second core layer 53 and the second memory layer 54 of the second die group can be reversed. , as shown in FIG. 7, the second memory layer 54 of the second die group of this structure is located between the first core layer 51 of the first die group and the second core layer 53 of the second die group, and the second The memory layer 54 is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.

When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 → the first transceiver circuit in the first die-to-die area 512 → the first Transceiver TSV of TSV 513→Transceiver TSV of fourth TSV 544→Second transceiver circuit in second grain-to-grain area 532→Second operation circuit in second operation area 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.

In the structure shown in FIG. 7 , the computing device 401 or the processing device 403 performs data exchange with other off-chip devices through the interface device 402, and the first memory area 521 or the second memory area 541 communicates with the off-chip memory 404. The path of data transmission is similar to the embodiment in FIG. 5 , and those skilled in the art can easily deduce it, so it is not described in detail.

Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Figure 8 shows a schematic diagram of vertical stacking in this embodiment. The vertically stacked chips in this embodiment are divided into a first die group and a second die group, the first die group is stacked on the second die group, and the first die group is respectively the third memory layer from top to bottom 85 (the third grain), the first core layer 81 (the first grain) and the first memory layer 82 (the second grain), and the second grain group is respectively the fourth memory layer 86 (the second grain) from top to bottom second die), the second core layer 83 (the first die) and the second memory layer 84 (the third die), that is, the fourth memory layer 86 is located between the first memory layer 82 and the second core layer 83 . The layers in FIG. 8 are visually separated up and down and shown in this way for convenience of illustration only.

The functions and effects of the first core layer 81, the first memory layer 82, the second core layer 83, and the second memory layer 84 are the same as the first core layer 51, the first memory layer 52, and the second core layer 53 in the foregoing embodiments. , and the second memory layer 54 are the same, so details are not repeated here.

The third memory layer 85 includes a third memory area 851 and a fifth TSV 852 , the third memory area 851 covers the logic layer of the third memory layer 85 , that is, the top side of the third memory layer 85 in the figure. The third memory area 851 is formed with storage units for temporarily storing the calculation results of the first calculation circuit. The fifth through-silicon vias 852 are spread over the entire third memory layer 85 and are only shown on one side for electrical connection. components. The third memory layer 85 is only responsible for temporarily storing the calculation results of the first calculation circuit, and is not responsible for the external contact task of the first die group. The first computing circuit can use the temporary storage space of the first memory area 821 and the third memory area 851, and when the computing device 401 wants to temporarily store intermediate data, it can temporarily store it to the third memory area 851 through the fifth TSV 852, Or it is temporarily stored in the first memory area 821 through the first TSV 813 .

The fourth memory layer 86 includes a fourth memory area 861 and sixth TSVs 862 . The fourth memory area 861 covers the logical layer of the fourth memory layer 86 , ie the top side of the fourth memory layer 86 in the figure. The fourth memory area 861 has storage units for temporarily storing the operation results of the second operation circuit. The sixth through-silicon vias 862 are spread over the entire fourth memory layer 86 and are only shown on one side for electrical connection. components. The fourth memory layer 86 is only responsible for temporarily storing the calculation results of the second calculation circuit, and is not responsible for the external contact task of the second die group. The second computing circuit can use the temporary storage space of the second memory area 841 and the fourth memory area 861, and when the processing device 403 wants to temporarily store intermediate data, it can temporarily store it to the fourth memory area 861 through the sixth TSV 862, Or it is temporarily stored in the second memory area 841 through the second TSV 833 .

When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 811 → the first transceiver circuit in the first die-to-die area 812 → the first Transceiver TSV of TSV 813 → Transceiver TSV of third TSV 824 → Transceiver TSV of sixth TSV 862 → Second transceiver circuit of second die-to-grain region 832 → No. The second computing circuit of the second computing area 831 ; when the processing device 403 intends to transmit data to the computing device 401 , the data reaches the computing device 401 through the aforementioned reverse path.

When the calculation result of the first die group needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 822→the second The input-output TSV of the third TSV 824 → the input-output TSV of the sixth TSV 862 → the input-output TSV of the second TSV 833 → the input-output TSV of the fourth TSV 844 ; When other off-chip devices want to transmit data to the first die group, the data arrives at the first memory area 821 through the aforementioned reverse path. When the calculation result of the second die group needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the second input-output circuit of the second input-output area 842→the second The input and output TSVs of the four TSVs 844 ; when other off-chip devices want to transmit data to the second die group, the data arrives at the second memory area 841 through the aforementioned reverse path.

When the data of the first die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 823 → the physical TSV of the third TSV 824 → the physical TSV of the sixth TSV 862 → the physical TSV of the second TSV 833 → the physical TSV of the fourth TSV 844; when the off-chip memory 404 wants to transmit input data to the first chip When the granular group is processed by the computing device 401 , the data arrives at the first memory area 821 through the aforementioned reverse path. When the data of the second die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 843→the physical TSV of the fourth TSV 844 ; When the off-chip memory 404 intends to transmit input data to the second die group for processing by the processing device 403 , the data arrives at the second memory area 841 through the aforementioned reverse path.

In this embodiment, the first core layer 81 is used in conjunction with the first memory layer 82 and the third memory layer 85, and the second core layer 83 is used in conjunction with the second memory layer 84 and the fourth memory layer 86. For transmission efficiency, The first core layer 81 and the first memory layer 82 adopt a face-to-face bonding process, so that the transmission path between the first computing circuit and the first memory area 821 is the shortest, and the first core layer 81 and the third memory layer 85 adopt face-to-back bonding Manufacturing process, the second core layer 83 and the fourth memory layer 86 adopt a face-to-face bonding process, which also makes the transmission path between the second computing circuit and the fourth memory area 861 the shortest, and the second core layer 83 and the second memory layer 84 adopt face-to-face In the back bonding process, the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 82 and the fourth memory layer 86 adopt a back-to-back bonding process.

As shown in FIG. 8, the first grain-to-grain region 812 and the second grain-to-grain region 832 are vertically stacked such that the grain-to-grain interface of the first core layer 81 is connected to the grain of the second core layer 83. The interface to the die is directly electrically connected to the sixth TSV 862 through the first TSV 813 , the third TSV 824 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.

Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Figure 9 shows a schematic diagram of vertical stacking in this embodiment. In this embodiment, the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group. The first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom, and the second die group is respectively the second core layer from top to bottom. 93 (the first die) and the second memory layer 94 (the second die), the third die group only includes the third memory layer 95, so the third memory layer 95 is located under the second memory layer 94. The layers in FIG. 9 are visually separated up and down and shown in this way for convenience of illustration only.

The first core layer 91 includes a first operation area 911, the first operation area 911 is covered with the logic layer of the first core layer 91, that is, the top side of the first core layer 91 in the figure, and the first core layer 91 also includes in a special area The first die-to-grain area 912 and the first TSV 913, the first memory layer 92 includes a first memory area 921 and a second TSV 922, the first memory area 921 is full of the logic of the first memory layer 92 layer, that is, the top side of the first memory layer 92 in the figure. The first memory area 921 has storage units for temporarily storing the calculation results of the first calculation circuit. The second core layer 93 includes a second operation area 931, the second operation area 931 is full of the logic layer of the second core layer 93, that is, the top side of the second core layer 93 in the figure, and the second core layer 93 also includes in a special area The second die-to-grain area 932 and the third TSV 933, the second memory layer 94 includes a second memory area 941 and a fourth TSV 942, the second memory area 941 is full of the logic of the second memory layer 94 layer, that is, the top side of the second memory layer 94 in the figure, and the second memory area 941 has storage units for temporarily storing the operation results of the second operation circuit.

The third memory layer 95 includes a third memory area 951, a first input-output area 952, a second input-output area 953, a first physical access area 954, a second physical access area 955, and a fifth TSV 956. The third memory The area 951 is formed with a storage unit for temporarily storing the calculation results of the first operation circuit or the second operation circuit, and the first input-output area 952 is formed with a first input-output circuit, which is used as an interface for the first die group to communicate with the outside world , that is to realize the function of the interface device 402, the second input and output area 953 generates a second input and output circuit, which is used as an interface for the second die group to communicate with the outside world, that is, realizes the function of the interface device 402, and the first physical area 954 generates There is a first physical access circuit for connecting the first die group and the off-chip memory 404 , and the second physical area 955 has a second physical access circuit for connecting the second die group and the off-chip memory 404 .

TSVs are present throughout the entire layer, only shown on one side by way of example. If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs. The transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit, the input-output TSV is used to electrically conduct the data of the input-output circuit, and the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.

When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 911 → the first transceiver circuit in the first die-to-die area 912 → the first Transceiver TSV of TSV 913 → Transceiver TSV of second TSV 922 → second transceiver circuit of second grain-to-grain region 932 → second computing circuit of second computing region 931; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.

The first die group and the second die group are not directly connected to the off-chip, and when they need to be connected to the off-chip, this embodiment is implemented through the third memory layer 95 of the third die group.

When the calculation result of the computing device 401 needs to exchange data with other off-chip devices through the interface device 402, the data will be transmitted to the third memory area 951 for temporary storage through the input and output silicon vias of each layer, and then the third memory area 951 reaches other off-chip devices through the following paths: the first I/O circuit of the first I/O region 952 → the first I/O TSV of the fifth TSV 956; when other off-chip devices want to transmit data to the In the case of a die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the first memory area 921 .

When the calculation result of the processing device 403 needs to exchange data with other off-chip devices through the interface device 402, the data will be transmitted to the third memory area 951 for temporary storage through the input and output through-silicon vias of each layer, and then the third memory area 951 reaches other devices off-chip through the following path: the second input-output circuit of the second input-output area 953 → the second input-output silicon via of the fifth silicon via 956; when other devices outside the chip want to transmit data to the second In the case of a two-die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the second memory area 941 .

When the data in the first memory area 921 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSVs of each layer, and then from the third memory area 951 to the off-chip through the following path Other devices: the first physical access circuit of the first physical area 954 → the first physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the first die group, the input data passes through The aforementioned reverse path is temporarily stored in the third memory area 951 , and then transmitted from the third memory area 951 to the first memory area 921 .

When the data in the second memory area 941 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSV of the fourth TSV, and then the third memory area 951 will pass through the following path Reaching other off-chip devices: the second physical access circuit of the second physical area 955→the second physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the second die group, The input data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then is transmitted from the third memory area 951 to the second memory area 941 through the physical TSV of the fourth TSV.

In this embodiment, the first core layer 91 is used in conjunction with the first memory layer 92, and the second core layer 93 is used in conjunction with the second memory layer 94. For transmission efficiency, the first core layer 91 and the first memory layer 92 use The face-to-face bonding process makes the transmission path between the first computing circuit and the first memory area 921 the shortest, and the second core layer 93 and the second memory layer 94 adopt a face-to-face bonding process, which also makes the second computing circuit and the second memory area 941 The transmission path is the shortest. In order to realize the aforementioned shortest transmission path, the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 92 and the second core layer 93 adopt a back-to-back bonding process, and the second die group and the second core layer adopt a back-to-back bonding process. The three-die group adopts a face-to-back bonding process, that is, the second memory layer 94 and the third memory layer 95 adopt a face-to-back bonding process.

As shown in FIG. 9, the first grain-to-grain region 912 and the second grain-to-grain region 932 are vertically stacked such that the grain-to-grain interface of the first core layer 91 is connected to the grain of the second core layer 93. The interface to the die is directly electrically connected to the second TSV 922 through the first TSV 913 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.

Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Fig. 10 shows a schematic diagram of vertical stacking of this embodiment. In this embodiment, the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group. The first die group is respectively the third memory layer B and the first core layer A from top to bottom, the second die group is respectively the first memory layer D and the second core layer C from top to bottom, and the third die The group includes the second memory tier E only. Obviously, the only difference between the vertical stacking structure of this embodiment and the embodiment in FIG. 9 is that the positions of the core layer and the memory layer of the first die group and the second die group are swapped. Based on the description of the foregoing embodiments, those skilled in the art can The synergy between layers in this embodiment can be known without creative effort, so details will not be described.

The multiple embodiments above are all vertically stacked system-on-chips, which can be realized by FCBGA (flip chip ball grid array) or CoWoS (chip on wafer on substrate) packaging technology. FCBGA is a packaging format called flip-chip ball grid array. Small balls are used instead of pins to connect circuits, which can provide the shortest external connection distance. Using this package not only provides excellent electrical performance, but also reduces component interconnections Between the loss and inductance, reduce the problem of electromagnetic interference, and withstand higher frequencies. CoWoS is an integrated production technology. First, the die is connected to the silicon wafer (wafer) through the CoW packaging process, and then the CoW die is connected to the substrate to form CoWoS. Through this technology, multiple dies can be packaged. Together, the technical effects of small package size, low power consumption, and fewer pins are achieved.

Another embodiment of the present invention is a method of making a vertically stacked chip as shown in FIG. 5 , the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core. Its flowchart is shown in Figure 11.

In step 1101 , a first transceiver circuit is formed in a first die-to-die region 512 in the first core layer 51 . In step 1102 , the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 . In step 1103 , generate transceiver TSVs in the first memory layer 52 . In step 1104 , generating I/O TSVs in the second core layer 53 and the second memory layer 54 . In step 1105 , physical TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1106, the first memory layer 52 is set between the first core layer 51 and the second core layer 53, that is, the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 The order is stacked from top to bottom. In step 1107, the first core layer 51 and the first memory layer 52 are bonded face to face. In step 1108, the second core layer 53 and the second memory layer 54 are bonded face to face. In step 1109, the first die group and the second die group are bonded back to back.

Under such a structure, the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 52 is electrically connected to the first transceiver circuit through the transceiver through silicon via. circuit and the second transceiver circuit; the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips; the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523 The second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .

Another embodiment of the present invention is a method of making a vertically stacked chip as shown in FIG. 7, the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core. Its flow chart is shown in Figure 12.

In step 1201 , a first transceiver circuit is formed in the first die-to-die region 512 in the first core layer 51 . In step 1202 , the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 . In step 1203 , generate transceiver TSVs in the second memory layer 54 . In step 1204 , the I/O TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1205 , physical TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1206, the second memory layer 54 is set between the first core layer 51 and the second core layer 53, that is, the first memory layer 52, the first core layer 51, the second memory layer 54 and the second core layer 53 The order is stacked from top to bottom. In step 1207, the first core layer 51 and the first memory layer 52 are bonded face to face. In step 1208, the second core layer 53 and the second memory layer 54 are bonded face to face. In step 1209, the first die group and the second die group are bonded back to back.

Under such a structure, the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the second memory layer 54 is electrically connected to the first transceiver circuit through the transceiver through silicon via. circuit and the second transceiver circuit; the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips; the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523 The second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .

Another embodiment of the present invention is a method of making a vertically stacked chip as shown in Figure 8, the vertically stacked chip of this embodiment is divided into a first die group and a second die group, the first die group Stacked on the second die group, the first die group includes a first core layer 81 (first die), a first memory layer 82 (second die) and a third memory layer 85 (third die) , the second die group includes a second core layer 83 (first die), a second memory layer 84 (third die) and a fourth memory layer 86 (second die). Its flow chart is shown in Figure 13.

In step 1301 , a first transceiver circuit is formed in a first die-to-die region 812 in the first core layer 81 . In step 1302 , the second transceiver circuit is formed in the second die-to-die region 832 in the second core layer 83 . In step 1303 , TSVs for transmitting and receiving are formed in the first memory layer 82 and the fourth memory layer 86 . In step 1304 , the I/O TSVs are generated in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 . In step 1305 , physical TSVs are formed in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 . In step 1306, the first core layer 81 and the first memory layer 82 are bonded face to face. In step 1307, the third memory layer 85 and the first core layer 81 are bonded face to back. In step 1308, the second core layer 83 and the fourth memory layer 86 are bonded face to face. In step 1309, the second memory layer 84 and the second core layer 83 are bonded face to back. In step 1310, based on the order of the third memory layer 85, the first core layer 81, and the first memory layer 82, stacking is performed from top to bottom. In step 1311 , based on the order of the fourth memory layer 86 , the second core layer 83 and the second memory layer 84 , stacking from top to bottom. In step 1312, the first die group and the second die group are bonded back to back.

Under such a structure, the first operation area 811 and the second operation area 831 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 82 and the fourth memory layer 86 pass through the transceiver through silicon vias Electrically connect the first transceiver circuit and the second transceiver circuit; the data in the first memory area 821 is transmitted to the outside of the vertically stacked chips through the first input-output area 822 and the input-output through-silicon vias, and the data in the second memory area 841 Through the second input-output area 842 and the input-output through-silicon via, the calculation result of the first operation area 811 is transmitted to the off-chip memory 404 through the first physical area 823 and the physical through-silicon via, and the second operation area The operation result of 831 is transmitted to the off-chip memory 404 through the second physical area 843 and the physical TSV.

Another embodiment of the present invention is a method of making vertically stacked chips as shown in FIG. third grain group. The first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom, and the second die group is respectively the second core layer from top to bottom. 93 (the first die) and the second memory layer 94 (the second die), and the third die group only includes the third memory layer 95 . Its flow chart is shown in Figure 14.

In step 1401 , a first transceiver circuit is formed in a first die-to-die region 912 in the first core layer 91 . In step 1402 , the second transceiver circuit is formed in the second die-to-die region 932 in the second core layer 93 . In step 1403 , generate transceiver TSVs in the first memory layer 92 . In step 1404 , generate I/O TSVs in the third memory layer 95 . In step 1405 , physical TSVs are formed in the third memory layer 95 . In step 1406, the first core layer 91 and the first memory layer 92 are bonded face to face. In step 1407, the second core layer 93 and the second memory layer 94 are bonded face to face. In step 1408 , based on the order of the first core layer 91 and the first memory layer 92 , stacking is performed from top to bottom. In step 1409, based on the order of the second core layer 93 and the second memory layer 94, stacking is performed from top to bottom. In step 1410, the first die group and the second die group are bonded back to back. In step 1411, the third die group and the second die group are bonded face to back.

In this embodiment, the third memory layer 95 includes a third memory area 951, a first I/O area 952, a second I/O area 953, a first physical access area 954, a second physical access area 955 and a fifth silicon through hole 956, the third memory area 951 is formed with a storage unit for temporarily storing the calculation result of the first operation circuit or the second operation circuit, and the first input-output area 952 is formed with a first input-output circuit for use as the first The interface for external contact of the grain group, that is, to realize the function of the interface device 402, and the second input and output area 953 generates a second input and output circuit, which is used as an interface for the external contact of the second die group, that is, to realize the function of the interface device 402, The first physical area 954 generates a first physical access circuit for contacting the first die group and the off-chip memory 404, and the second physical area 955 generates a second physical access circuit for contacting the second die group and the on-chip memory. 404 out of memory.

The first die-to-grain region 912 and the second die-to-grain region 932 are vertically stacked such that the die-to-grain interface of the first core layer 91 directly passes through the grain-to-grain interface of the second core layer 93. The first TSV 913 is electrically connected to the second TSV 922 without using the interposer 201 as shown in FIG. 2 for transmission.

FIG. 15 shows the manufacturing method of back-to-back stacking in the foregoing embodiments.

In step 1501, circuits are formed on the logic side of a first wafer. Each wafer can be divided into a logic side and an opposite side. The logic side refers to the side where logic circuits are generated to achieve specific electrical functions, while the opposite side is the side of the wafer where logic circuits are not laid out. Since the generation of the logic circuit is to carry out processes such as deposition and etching on the top of the wafer, in this step, as shown in FIG. 1603 is located under the first wafer 1601 .

In this step, firstly a front end of line (FEOL) 1604 is formed on the logic side 1602, then a first TSV 1605 is formed on the logic side 1602, and finally a backend process layer (backend) is formed on the logic side 1602. of line, BEOL) 1606, so that the first TSV 1605 is electrically connected to the subsequent process layer 1606. The previous process is to divide the region for preparing transistors on the silicon substrate, and then ion implantation to realize N-type and P-type regions to realize N-type and/or P-type field effect transistors. The subsequent process is multi-layer conductive metal wires, which can connect the transistors on the substrate according to the design requirements to achieve specific functions. After the previous process and the subsequent process, the previous process layer 1604 and the subsequent process layer 1606 are respectively formed. The circuit on the logic side is mainly realized by the front-end process layer 1604 , and the electrical connection of each element in the circuit is realized by the back-end process layer 1606 .

In step 1502, the first wafer 1601 is tested to eliminate defective products. Wafer testing, also known as mid-test, aims to ensure that each chip can basically meet the characteristics of the circuit or design specifications, usually including the verification of voltage, current, timing and electrical functions.

In step 1503, the first wafer 1601 is flipped over. For those first wafers that are not eliminated, perform a 180-degree flip. After the flip, as shown in FIG. 17 , the logical side 1602 of the first wafer 1601 is located at the bottom, and the opposite side 1603 is located at the top.

In step 1504 , a second wafer 1701 is bonded on the logic side 1602 to form the structure shown in FIG. 17 .

In step 1505 , the first TSV 1605 is exposed on the opposite side 1603 . First, the opposite side 1603 is ground, and the ground opposite side 1603 is chemical mechanical polished (CMP) to form the structure shown in FIG. 18 . Next, plasma etch the opposite side 1603 after chemical mechanical polishing, so that the first TSV 1605 protrudes from the surface of the opposite side 1603 to form a structure as shown in FIG. 19 . Then low temperature chemical vapor deposition (LTCVD) silicon dioxide on the plasma-etched surface to form a silicon dioxide layer 2001 as shown in FIG. 20 . Finally, the surface after the low-temperature chemical vapor deposition is chemically mechanically polished to make the silicon dioxide layer 2001 flat and expose the first TSV 1605 , that is, the structure shown in FIG. 21 .

In step 1506, the first wafer 1601 is diced into a plurality of first dies. First, as shown in FIG. 22, the first wafer 1601 and the second wafer 1701 are placed on the support (mount on frame) 2201, and then the second wafer 1701 is supported by the thimble 2202, and then according to the size of the circuit and the Positionally cut the first wafer 1601 and the second wafer 1701 , that is, cut along the dotted line in the figure, and finally generate a plurality of first crystal grains 2203 .

In step 1507, the first die 2203 is flipped 180 degrees to form the structure shown in FIG. 23 .

In step 1508 , attach the opposite side of the first die to the opposite side of the second die so that the first TSV is in electrical communication with the second TSV of the second die. The second die can be realized based on the manufacturing process of the prior art, and this embodiment does not limit the manufacturing process of the second die. As shown in FIG. 24, the opposite side 1603 of the first crystal grain 2203 and the opposite side 2402 of the second crystal grain 2401 are bonded together, so that the first TSV 1605 is electrically connected to the second TSV 2403 of the second crystal grain 2401. connected.

So far, a back-to-back structure has been formed, that is, the opposite side 1603 of the first die 2203 and the opposite side 2402 of the second die 2401 are bonded together, passing through the first through-silicon via 1605 and the second through-silicon via 2403 so that the logic on both sides The circuits on both sides are electrically connected.

In step 1509, the first die 2203 is molded by molding compound formation to form the structure shown in FIG. 25 . There are various molding and packaging processes in the prior art, for example, a direct bonding package can be used, in which the first die 2203 and the second die 2401 are directly bonded on a printed circuit board or covered with metal leads On the strip of the plastic film, organic resin is used to drip around the first die 2203 to form a package body 2501 to cover it.

In step 1510, the plastic-encapsulated first die is smoothed.

In step 1511 , the ground first grain is chemically mechanically polished to form the structure shown in FIG. 26 . So far, the entire back-to-back stacking process is completed.

The solution of the present invention is to stack the core layers vertically, and arrange the processor cores of the same die group to be bonded face-to-face with the memory, and the adjacent die groups to be bonded back to back, so that the processor cores of the same die group The transfer path for the die-to-die interface to the memory is greatly shortened. According to the current manufacturing process, the thickness of the logic side is only 0.3 microns, and the thickness of the bonding layer is about 1 micron, so the transmission path between the processor core and the memory can be shortened to 1.6 microns, which helps to improve the transmission between cores efficiency.

According to different application scenarios, the electronic equipment or device of the present invention may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, PC equipment, Internet of Things terminals, mobile Terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, earphones, mobile storage, wearable devices, visual terminals, automatic driving terminals, vehicles, household appliances, and/or medical equipment. Said vehicles include airplanes, ships and/or vehicles; said household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods; said medical equipment includes nuclear magnetic resonance instruments, Ultrasound and/or electrocardiograph. The electronic equipment or device of the present invention can also be applied to fields such as the Internet, the Internet of Things, data centers, energy, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and medical care. Further, the electronic device or device of the present invention can also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, and terminal. In one or more embodiments, electronic devices or devices with high computing power according to the solution of the present invention can be applied to cloud devices (such as cloud servers), while electronic devices or devices with low power consumption can be applied to terminal devices and/or Edge devices (such as smartphones or cameras). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, the hardware resources of the cloud device can be Match appropriate hardware resources to simulate the hardware resources of terminal devices and/or edge devices, so as to complete the unified management, scheduling and collaborative work of device-cloud integration or cloud-edge-end integration.

It should be noted that, for the purpose of brevity, the present invention expresses some methods and their embodiments as a series of actions and combinations thereof, but those skilled in the art can understand that the solution of the present invention is not limited by the order of the described actions . Therefore, according to the disclosure or teaching of the present invention, those skilled in the art can understand that some of the steps can be performed in other order or at the same time. Further, those skilled in the art can understand that the embodiments described in the present invention can be regarded as optional embodiments, that is, the actions or modules involved therein are not necessarily necessary for the realization of one or some solutions of the present invention. In addition, according to different schemes, the description of some embodiments of the present invention also has different emphases. In view of this, those skilled in the art may understand the parts not described in detail in a certain embodiment of the present invention, and may also refer to relevant descriptions of other embodiments.

In terms of specific implementation, based on the disclosure and teaching of the present invention, those skilled in the art can understand that several embodiments disclosed in the present invention can also be implemented in other ways not disclosed herein. For example, with respect to each unit in the above-mentioned electronic device or device embodiment, this paper divides them on the basis of considering logical functions, but there may be other division methods in actual implementation. As another example, multiple units or components may be combined or integrated into another system, or some features or functions in units or components may be selectively disabled. As far as the connection relationship between different units or components is concerned, the connections discussed above in conjunction with the drawings may be direct or indirect couplings between units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection using an interface, where the communication interface may support electrical, optical, acoustic, magnetic or other forms of signal transmission.

In the present invention, a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit. The aforementioned components or units may be located at the same location or distributed over multiple network units. In addition, according to actual needs, some or all of the units may be selected to achieve the purpose of the solutions described in the embodiments of the present invention. In addition, in some scenarios, multiple units in this embodiment of the present invention may be integrated into one unit, or each unit exists physically independently.

In other implementation scenarios, the above-mentioned integrated units may also be implemented in the form of hardware, that is, specific hardware circuits, which may include digital circuits and/or analog circuits. The physical realization of the hardware structure of the circuit may include but not limited to physical devices, and the physical devices may include but not limited to devices such as transistors or memristors. In view of this, various devices (such as computing devices or other processing devices) described herein may be implemented by appropriate hardware processors, such as central processing units, GPUs, FPGAs, DSPs, and ASICs. Further, the aforementioned storage unit or storage device can be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which can be, for example, a variable resistance memory (Resistive Random Access Memory, RRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), Static Random Access Memory (Static Random Access Memory, SRAM), Enhanced Dynamic Random Access Memory (Enhanced Dynamic Random Access Memory, EDRAM), High Bandwidth Memory (High Bandwidth Memory , HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM and RAM, etc.

The foregoing can be better understood in light of the following terms:

Clause A1. A vertically stacked chip comprising: a first die group including a first die and a second die using a face-to-face process; and a second die group including the first die and a second die using a face-to-face process Two crystal grains; wherein, the first crystal grain group and the second crystal grain group adopt a back-to-back process.

Clause A2. The vertically stacked chips of Clause A1, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory.

Clause A3. The vertically stacked chip of Clause A2, wherein the processor core of the first die group includes a first die-to-die region generating a first transceiver circuit, the processor core of the second die group The processor core includes a second die-to-die region, and a second transceiver circuit is generated; wherein, the processor cores of the first die and the second die group pass through the first transceiver circuit and the The second transceiver circuit performs inter-layer data transmission.

Clause A4. The vertically stacked die of Clause A3, wherein the memory of the first die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the first die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.

Clause A5. The vertically stacked die of Clause A4, wherein the memory of the first die group includes a first input and output area, the processor cores of the second die group, and the memory of the second die group The memory is provided with input-output through-silicon vias, and the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the input-output through-silicon vias.

Clause A6. The vertically stacked chips of Clause A4, wherein the memory of the second die group includes a second input and output area, and the data in the memory of the second die group passes through the through silicon vias Transfer to the outside of the vertically stacked chips.

Clause A7. The vertically stacked die of Clause A4, connected to off-chip memory, wherein the memory of the first die group further comprises a first physical area, the processor core of the second die group and the The memory of the second die group has physical through-silicon vias, and the operation result of the processor core of the first die group is transmitted to the off-chip memory through the first physical area and the physical through-silicon vias.

Clause A8. The vertically stacked die of Clause A3, wherein the memory of the second die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the second die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.

Clause A9. The vertically stacked chips of Clause A1 , wherein the first die group further includes a third die that is face-to-back with the first die of the first die group.

Clause A10. The vertically stacked chip of Clause A9, wherein the first die is a processor core, the second die is a memory and the third die is a memory.

Clause A11. The vertically stacked chips of Clause A1, further comprising a third die group, face-to-back process with the second die group.

Clause A12. The vertically stacked chip of any one of clauses A1 to 11, wherein the layers are packaged in a flip-chip ball grid array.

Clause A13. The vertically stacked chip according to any one of Clauses A1 to 11, wherein the layers are packaged in CoWoS.

Clause A14. An integrated circuit device comprising vertically stacked chips according to any one of Clauses A1 to 11.

Clause A15. A board comprising the integrated circuit arrangement according to Clause A14.

Clause A16. A method of vertically stacking chips, the vertically stacking chips comprising a first die group and a second die group, the method comprising: bonding a first die in the first die group face-to-face and the second crystal grain; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain group and the second crystal grain group back-to-back.

Clause A17. The method of Clause A16, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory, the method further comprising: generating a first transceiver circuit in a first die-to-die region of the processor cores of the first die group; and generating a second transceiver circuit in a second die-to-die region of a processor core in the second die group A die-to-die area; wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.

Clause A18. The method of Clause A17, further comprising: generating TSVs in the memory of the first die group; and setting the memory of the first die group in the process of the first die group Between the processor core and the processor core of the second die group; wherein, the memory of the first die group is electrically connected to the first transceiver circuit and the second transceiver circuit through the transceiver through-silicon via .

Clause A19. The method of Clause A18, wherein the memory of the first die group includes a first input-output region and the memory of the second die group includes a second input-output region, the method further comprising: generating input and output through-silicon vias in the processor core of the second die group and the memory of the second die group; wherein, the data in the memory of the first die group passes through the first input and output region and the I/O TSV are transmitted to the outside of the vertically stacked chips, and the data in the memory of the second die group is transmitted to the second IO region and the IO TSV to the described above for vertical stacking of chips.

Clause A20. The method of Clause A17, the vertically stacked die connected to an off-chip memory, wherein the memory of the first die group further includes a first physical region, the method further comprising: generating physical through-silicon vias In the processor core of the second die group and the memory of the second die group; wherein, the calculation result of the processor core of the first die group passes through the first physical area and the physical TSVs communicate to the off-chip memory.

Clause A21. The method of Clause A16, further comprising: generating TSVs in memory of the second die group; and placing memory in the second die group in memory of the first die group Between the processor core and the processor core of the second die group; wherein, the transceiver TSV is electrically connected to the first transceiver circuit and the second transceiver circuit.

Clause A22. The method of Clause A16, wherein the first die set further includes a third die, the method comprising: face-to-back bonding the third die to the first die set of the first grain.

Clause A23. The method of Clause A22, wherein the first die is a processor core, the second die is a memory and the third die is a memory.

Clause A24. The method of Clause A16, the vertically stacking chips further comprising a third die set, the method further comprising: face-to-back bonding the third die set to the second die Group.

The embodiments of the present invention have been described in detail above, and specific examples have been used in this paper to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only used to help understand the method and core idea of the present invention; at the same time, for Those skilled in the art will have changes in the specific implementation and scope of application according to the idea of the present invention. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

A vertically stacked chip comprising:

a first die group comprising a first die and a second die using a face-to-face process; and

a second die group comprising first and second dies in a face-to-face process;

Wherein, the first die group and the second die group adopt a back-to-back process.
The vertically stacked chip according to claim 1, wherein the first die is one of a processor core and a memory, and the second die is the other one of a processor core and a memory.
The vertically stacked chip according to claim 2, wherein the processor core of the first die group comprises a first die-to-die region, a first transceiver circuit is generated, and the processor core of the second die group The core includes a second die-to-die region generating a second transceiver circuit;

Wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
The vertically stacked chips according to claim 3, wherein the memory of the first die group is located between the processor cores of the first die group and the processor cores of the second die group, the second die group The memory of a chip group is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
The vertically stacked chip of claim 4, wherein the memory of the first die group includes a first input and output area, the processor core of the second die group and the memory of the second die group generate There are input-output through-silicon vias, and the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the input-output through-silicon vias.
The vertically stacked chip according to claim 4, wherein the memory of the second die group includes a second input and output area, and the data in the memory of the second die group is transmitted to the outside of the vertically stacked chips.
The vertically stacked chip of claim 4, connected to an off-chip memory, wherein the memory of the first die group further includes a first physical area, the processor core of the second die group and the second Physical through-silicon vias are generated in the memory of the die group, and the computing results of the processor cores of the first die group are transmitted to the off-chip memory through the first physical area and the physical through-silicon vias.
The vertically stacked chips according to claim 3, wherein the memory of the second die group is located between the processor cores of the first die group and the processor cores of the second die group, the second die group The memory of the two-die group is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
The vertically stacked chips according to claim 1 , wherein the first die group further comprises a third die, and the first die of the first die group adopts a face-to-back process.
The vertically stacked chips of claim 9, wherein the first die is a processor core, the second die is a memory, and the third die is a memory.
The vertically stacked chips according to claim 1 , further comprising a third die group, which adopts a face-to-back process with the second die group.
The vertically stacked chip according to any one of claims 1 to 11, wherein each layer is packaged in a flip chip ball grid array (FCBGA) manner.
The vertically stacked chip according to any one of claims 1 to 11, wherein each layer is packaged in a CoWoS (chip on wafer on substrate) manner.
An integrated circuit device comprising vertically stacked chips according to any one of claims 1 to 11.
A board comprising the integrated circuit device according to claim 14.
A method for vertically stacking chips, the vertically stacking chips comprising a first die group and a second die group, the method comprising:

bonding the first die and the second die in the first die group face to face;

bonding the first die and the second die in the second die group face to face; and

bonding the first die group and the second die group back to back.
The method according to claim 16, wherein the first die is one of the processor core and the memory, and the second die is the other of the processor core and the memory, and the method further comprises:

generating a first transceiver circuit in a first die-to-die region of a processor core of the first die group; and

generating a second transceiver circuit in a second die-to-die region of the processor core of the second die group;

Wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
The method of claim 17, further comprising:

generating transceiver TSVs in the memory of the first die group;

disposing the memory of the first die group between the processor cores of the first die group and the processor cores of the second die group;

Wherein, the memory of the first die group is electrically connected to the first transceiver circuit and the second transceiver circuit through the transceiver TSV.
The method of claim 18, wherein the memory of the first die group includes a first input-output area and the memory of the second die group includes a second input-output area, the method further comprising:

generating TSVs in the processor core of the second die set and the memory of the second die set;

Wherein, the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the through-silicon vias, and the data in the memory of the second die group The data is transmitted to the outside of the vertically stacked chips through the second I/O area and the I/O TSV.
The method of claim 17, wherein the vertically stacked chips are connected to an off-chip memory, wherein the memory of the first die group further includes a first physical area, the method further comprising:

generating physical TSVs in the processor cores of the second die set and the memory of the second die set;

Wherein, the operation result of the processor core of the first die group is transmitted to the off-chip memory through the first physical area and the physical TSV.
The method of claim 16, further comprising:

generating TSVs in the memory of the second die set; and

disposing the memory of the second die group between the processor cores of the first die group and the processor cores of the second die group;

Wherein, the transceiver TSV is electrically connected to the first transceiver circuit and the second transceiver circuit.
The method of claim 16, wherein the first group of dies further comprises a third die, the method comprising:

The third die and the first die of the first die group are bonded face-to-back.
The method of claim 22, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
The method according to claim 16, the vertically stacking chips further comprising a third die group, the method further comprising:

The third crystal grain group and the second crystal grain group are bonded face to back.