WO2023056875A1 - 多核芯片、集成电路装置、板卡及其制程方法 - Google Patents

多核芯片、集成电路装置、板卡及其制程方法 Download PDF

Info

Publication number
WO2023056875A1
WO2023056875A1 PCT/CN2022/122372 CN2022122372W WO2023056875A1 WO 2023056875 A1 WO2023056875 A1 WO 2023056875A1 CN 2022122372 W CN2022122372 W CN 2022122372W WO 2023056875 A1 WO2023056875 A1 WO 2023056875A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
layer
circuit
core
area
Prior art date
Application number
PCT/CN2022/122372
Other languages
English (en)
French (fr)
Inventor
邱志威
陈帅
高崧
Original Assignee
寒武纪(西安)集成电路有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 寒武纪(西安)集成电路有限公司 filed Critical 寒武纪(西安)集成电路有限公司
Publication of WO2023056875A1 publication Critical patent/WO2023056875A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L21/00Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
    • H01L21/70Manufacture or treatment of devices consisting of a plurality of solid state components formed in or on a common substrate or of parts thereof; Manufacture of integrated circuit devices or of parts thereof
    • H01L21/71Manufacture of specific parts of devices defined in group H01L21/70
    • H01L21/768Applying interconnections to be used for carrying current between separate components within a device comprising conductors and dielectrics
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L23/00Details of semiconductor or other solid state devices
    • H01L23/48Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads, terminal arrangements ; Selection of materials therefor
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/03Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
    • H01L25/04Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
    • H01L25/065Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/18Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof the devices being of types provided for in two or more different subgroups of the same main group of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/50Multistep manufacturing processes of assemblies consisting of devices, each device being of a type provided for in group H01L27/00 or H01L29/00

Definitions

  • the present invention generally relates to the field of semiconductors. More specifically, the present invention relates to a multi-core chip, an integrated circuit device, a board and a manufacturing method thereof.
  • D2D die-to-die
  • a die-to-die interface is a functional block that occupies a small area of the die to provide a data interface between two modules or two die assembled in the same package.
  • Die-to-die interfaces utilize very short channels to connect modules or dies within a package, with transfer rates and bandwidths that exceed traditional chip-to-chip interfaces.
  • two modules or dies connected by a die-to-die interface are usually placed side by side, and the die-to-die interfaces of the two modules or dies are adjacent, and the two die-to-die
  • the granular interface is electrically connected through the interposer layer below.
  • the transfer rate and bandwidth of the die-to-die interface are excellent, when transferring data through the underlying interposer, the transfer path is as high as millimeters. If the transmission path is too long, the signal will be attenuated and the speed will be reduced, which still cannot meet the requirements of high-intensity computing.
  • the solution of the present invention provides a multi-core chip, an integrated circuit device, a board and a manufacturing method thereof.
  • the present invention discloses a multi-core chip including a first core layer and a second core layer.
  • the first core layer includes: a first operation area, in which a first operation circuit is formed; and a first die-to-die area, in which a first transceiver circuit is formed.
  • the second core layer includes: a second operation area, in which a second operation circuit is formed; and a second die-to-die area, in which a second transceiver circuit is formed.
  • the first core layer and the second core layer are vertically stacked, and the first operation circuit and the second operation circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the present invention discloses an integrated circuit device including the aforementioned multi-core chip; and also discloses a board including the aforementioned integrated circuit device.
  • the present invention discloses a method for manufacturing a multi-core chip, comprising: generating a first core layer, the first core layer includes a first computing area, a first computing circuit is generated, and a first die-to-die
  • the first transceiver circuit is generated in the area;
  • the second core layer is generated, and the second core layer includes the second operation area, and the second operation circuit is generated, and the second die-to-die area is generated with the second transceiver circuit.
  • the first core layer and the second core layer are vertically stacked, and the first operation circuit and the second operation circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the interface between the two grains does not need to transmit data through the intermediary layer, and the transmission path of the interface between the two grains is greatly shortened, which helps to improve the transfer efficiency between nuclei.
  • FIG. 1 shows a top view of the layout of a package structure including a die-to-die interface
  • FIG. 2 shows a cross-sectional view of the packaging structure in FIG. 1 along the dotted line direction;
  • Fig. 3 is a structural diagram showing a board of an embodiment of the present invention.
  • Fig. 4 shows the schematic diagram of the chip of the embodiment of the present invention
  • FIG. 5 is a structural diagram illustrating an integrated circuit device according to an embodiment of the present invention.
  • Fig. 6 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 7 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 8 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 9 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 10 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 11 is a flow chart showing that another embodiment of the present invention makes the multi-core chip of Fig. 4;
  • Fig. 12 is a flow chart showing that another embodiment of the present invention makes the multi-core chip of Fig. 6;
  • Fig. 13 is a flow chart showing that another embodiment of the present invention makes the multi-core chip of Fig. 7;
  • Fig. 14 is a flow chart showing that another embodiment of the present invention makes the multi-core chip of Fig. 8;
  • Fig. 15 is a flowchart showing another embodiment of the present invention making the multi-core chip of Fig. 9;
  • FIG. 16 is a flow chart showing another embodiment of the present invention for manufacturing the multi-core chip of FIG. 10 .
  • the term “if” may be interpreted as “when” or “once” or “in response to determining” or “in response to detecting” depending on the context.
  • a die-to-die interface is like any other chip-to-chip interface, a data link channel between the two dies.
  • the die-to-die interface is logically divided into physical layer, link layer, and transaction layer, and provides a standardized parallel interface to the internal interconnect structure.
  • the layout of the package structure is located in a molding compound area 10 of a chip.
  • the molding compound area 10 includes a system area and a storage area.
  • An exemplary system area is located in the center of the molding compound area 10 for placing two SoCs 101 , and storage areas are respectively located on both sides of the system area for placing eight off-chip memories 102 .
  • the system area also has a die-to-die area 103 , a physical area 104 and an input-output area 105 .
  • the die-to-die area 103 is formed with a transceiver circuit for data sharing between the two SoCs 101;
  • the physical area 104 is formed with a physical access circuit for accessing the off-chip memory 102;
  • the input-output area 105 is formed with input and output
  • the circuit is used as an interface for external communication of the system on chip 101 .
  • the memory 106 is also placed in the system area as a temporary storage space of the system on chip 101 , its capacity is smaller than that of the off-chip memory 102 , but the data transfer rate is higher than that of the off-chip memory 102 .
  • FIG. 2 shows a cross-sectional view of the package structure in FIG. 1 along the dotted line direction.
  • the system area is divided into upper and lower layers.
  • the upper layer is the SoC 101
  • the lower layer is the transceiver circuit of the die-to-die area 103 , the memory 106 and the I/O circuit of the I/O area 105 .
  • the packaging structure further includes an interposer 201 and a substrate 202 , and the interposer 201 is disposed on the substrate 202 .
  • the path is the system on chip 101 at the sending end ⁇ the transceiver circuit of the die-to-die area 103 at the sending end ⁇ the interposer 201 ⁇ the transceiver circuit of the die-to-die area 103 at the receiving end ⁇
  • the system on chip 101 at the receiving end realizes the technical effect of low delay and low power consumption of the die-to-die port.
  • FIG. 3 shows a schematic structural diagram of a board 30 according to an embodiment of the present invention.
  • the board card 30 includes a chip 301, which is a system-on-a-chip integrated with one or more combination processing devices.
  • the combination processing device is an artificial intelligence computing unit to support various types of deep learning and Machine learning algorithms meet the intelligent processing requirements in complex scenarios in the fields of computer vision, speech, natural language processing, and data mining.
  • deep learning technology is widely used in the field of cloud intelligence.
  • a notable feature of cloud intelligence applications is the large amount of input data, which has high requirements for the storage capacity and computing power of the platform.
  • the board 30 of this embodiment is suitable for cloud intelligence applications. applications, with huge off-chip storage, on-chip storage and powerful computing capabilities.
  • the chip 301 is connected to an external device 303 through an external interface device 302 .
  • the external device 303 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card or a wifi interface, and the like.
  • the data to be processed can be transmitted to the chip 301 by the external device 303 through the external interface device 302 .
  • the calculation result of the chip 301 can be sent back to the external device 303 via the external interface device 302 .
  • the external interface device 302 may have different interface forms, such as a PCIe interface and the like.
  • the chip 301 includes computing means and processing means.
  • the computing device is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor, which is used to perform deep learning or machine learning calculations.
  • the processing device performs basic control including but not limited to data transfer, starting and/or stopping the computing device, and the like.
  • the processing means may be one or more types of processing in a central processing unit (CPU), a graphics processing unit (GPU), or other general-purpose and/or special-purpose processors.
  • processors include but are not limited to digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays, FPGAs) or other Program logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field-programmable gate arrays
  • Program logic devices discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs.
  • the computing device of this embodiment it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when the integration of computing devices and processing devices is considered together, the two are considered to form a heterogeneous multi-core structure.
  • the board 30 also includes a storage device 304 for storing data, which includes one or more storage units 305 .
  • the storage device 304 is connected and data transmitted with the control device 306 and the chip 301 through the bus.
  • the control device 306 in the board 30 is configured to regulate the state of the chip 301 .
  • the control device 306 may include a microcontroller (Micro Controller Unit, MCU).
  • Fig. 4 shows the schematic diagram of the chip 301 of this embodiment, it is a kind of multi-core chip, comprises the first core layer 41 and the second core layer 42, actually the first core layer 41 and the second core layer 42 are vertically stacked together , the first core layer 41 and the second core layer 42 in FIG. 4 are visually separated up and down and are only shown in this way for convenience of illustration.
  • the first core layer 41 includes a first computing region 411 , a first die-to-die region 412 and a first through silicon via (TSV) 413 .
  • the first operation area 411 is formed with a first operation circuit to realize the function of the calculation device;
  • the first die-to-die area 412 is formed with a first transceiver circuit, which is used as a die-to-die interface of the first operation circuit;
  • the first TSV 413 is used to realize electrical interconnection of stacked chips in a three-dimensional integrated circuit.
  • the second core layer 42 includes a second computing region 421 , a second die-to-die region 422 and a second TSV 423 .
  • the second operation area 421 is formed with a second operation circuit to realize the function of the processing device;
  • the second die-to-die area 422 is formed with a second transceiver circuit, which is used as a die-to-die interface of the second operation circuit;
  • the second TSV 423 is also used to realize the electrical interconnection of the stacked chips in the three-dimensional integrated circuit.
  • the first operation area 411 and the second operation area 421 are also respectively generated with a memory 414 and a memory 424 for temporarily storing the operation results of the first operation circuit and the second operation circuit.
  • the memory 414 and the memory 424 are directly arranged in the first operation area 411 and the second operation area 421 without conducting through an intermediary layer, and the data transmission rate is fast.
  • the first core layer 41 further includes an I/O area 415 and a physical area 416
  • the second core layer 42 further includes an I/O area 425 and a physical area 426 .
  • the input and output area 415 is formed with input and output circuits, which are used as an interface for the first core layer 41 to communicate with the outside world.
  • the physical area 416 has a physical access circuit for the first core layer 41 to access the off-chip memory
  • the physical area 426 has a physical access circuit for the second core layer 42 to access the off-chip memory.
  • the first computing circuit and the second computing circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the data reaches the processing device through the following path: the first computing circuit in the first computing area 411 ⁇ the first transceiver circuit in the first die-to-die area 412 ⁇ the second A through-silicon via 413 ⁇ the second transceiver circuit of the second die-to-grain region 422 ⁇ the second computing circuit of the second computing region 421;
  • the processing device intends to transmit data to the computing device, the data arrives through the following path:
  • the memory area 414 transmits the data to other devices through the input and output circuits. Specifically, when the data in the memory area 414 is to be transmitted to other off-chip devices, the data reaches other off-chip devices through the following path: the input-output circuit of the input-output area 415 ⁇ the first TSV 413 ⁇ the second TSV Through hole 423 ; when other off-chip devices want to transmit data to the memory area 414 , the data arrives at the memory area 414 through the aforementioned reverse path. It should be noted that some specific TSVs in the first TSV 413 and the second TSV 423 are specially designed to electrically conduct data of input and output circuits.
  • the data in the memory area 424 reaches other devices outside the chip through the following path: the input and output circuit of the input and output area 425 ⁇ the second TSV 423;
  • the data arrives at the memory area 424 through the aforementioned reverse path.
  • the memory area 414 transmits the data to the off-chip memory through the physical access circuit. Specifically, when the data in the memory area 414 is to be transferred to the off-chip memory, the data reaches the off-chip memory through the following path: the physical access circuit of the physical area 416 ⁇ the first TSV 413 ⁇ the second TSV 423; When the off-chip memory intends to transmit input data to the memory area 414 for processing by the computing device, the data arrives at the memory area 414 through the aforementioned reverse path. It should be noted that some specific TSVs in the first TSV 413 and the second TSV 423 are specially designed to electrically conduct data for physically accessing the circuit.
  • the memory area 424 transmits the data to the off-chip memory through the physical access circuit. Specifically, when the data in the memory area 424 is to be transmitted to the off-chip memory, the data reaches the off-chip memory through the following paths: the physical access circuit of the physical area 426 ⁇ the second TSV 423; when the off-chip memory wants to transmit input data When the memory area 424 is processed by the computing device, the data arrives at the memory area 424 through the aforementioned reverse path.
  • the first grain-to-grain region 412 and the second grain-to-grain region 422 are vertically stacked such that the grain-to-grain interface of the first core layer 41 is connected to the grain of the second core layer 42.
  • the interface to the die is directly electrically connected through the first TSV 413 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • the length of the TSV is about tens of microns. Compared with the millimeter-level length of the interposer, the data transmission of this embodiment is faster and the signal strength is better.
  • the combined processing device 50 includes a computing device 501 , an interface device 502 , a processing device 503 and an off-chip memory 504 .
  • the computing device 501 is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor for performing deep learning or machine learning calculations, which can interact with the processing device 503 through the interface device 502 to Work together to complete user-specified operations.
  • the interface device 502 is connected to the bus for connecting with other devices, such as the control device 306 and the external interface device 302 in FIG. 3 .
  • the processing device 503 performs basic control including but not limited to data transfer, starting and/or stopping of the computing device 501 .
  • the processing device 503 may be one or more types of processors in a central processing unit, a graphics processing unit, or other general and/or special purpose processors, these processors include but are not limited to digital signal processors , application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs.
  • the computing device 501 of this embodiment it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when considering the integration of the computing device 501 and the processing device 503 together, they are considered to form a heterogeneous multi-core structure.
  • the off-chip memory 504 is used to store data to be processed, which is a DDR memory, usually 16G or larger in size, and is used to store data of the computing device 501 and/or the processing device 503 .
  • Figure 6 shows a schematic diagram of vertical stacking in this embodiment.
  • This embodiment is also a multi-core chip, including a first core layer 61, a second core layer 62, and a memory layer 63.
  • the first core layer 61, the second core layer 62, and the memory layer 63 are sequentially arranged from top to bottom. Stacked vertically, the layers in Fig. 6 are visually separated up and down and shown in this way for convenience of illustration only.
  • the first core layer 61 includes a first operation area 611, the first operation area 611 is covered with the logic layer of the first core layer 61, that is, the top side of the first core layer 61 in the figure, and the first core layer 61 also includes in a special area The first grain-to-grain region 612 and the first TSV 613 .
  • the second core layer 62 includes a second operation area 621, the second operation area 621 is full of the logic layer of the second core layer 62, that is, the top side of the second core layer 62 in the figure, and the second core layer 62 also includes in a special area The second grain-to-grain region 622 and the second TSV 623 .
  • the positions of the first die-to-die region 612 and the second die-to-die region 622 are vertically opposite to each other. Its function and effect are the same as those of the foregoing embodiments, so details will not be repeated.
  • the memory layer 63 includes a memory area 631, a first I/O area 632, a second I/O area 633, a first physical area 634, a second physical area 635, and a third TSV 636.
  • the memory area 631 has storage units for Temporarily store the calculation results of the first calculation circuit or the second calculation circuit
  • the first input and output area 632 generates a first input and output circuit, which is used as an interface for the first calculation circuit to communicate with the outside world, that is, realizes the function of the interface device 502
  • the second The second input-output area 633 generates a second input-output circuit, which is used as an interface for the second computing circuit to communicate with the outside world, and also realizes the function of the interface device 502.
  • the first physical area 634 generates a first physical access circuit, which is used to transfer the memory
  • the calculation result of the first operation circuit stored in the area 631 is sent to the off-chip memory 504, and the second physical area 635 generates a second physical access circuit for sending the calculation result of the second operation circuit stored in the memory area 631 to the off-chip Memory 504.
  • the third TSV 636 spreads over the entire memory area 62 , and is only shown on one side as an example, for electrically connecting specific elements.
  • the first computing circuit and the second computing circuit perform inter-layer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the data reaches the processing device 503 through the following path: the first computing circuit of the first computing area 611 ⁇ the first transceiver of the first die-to-die area 612 Circuit ⁇ first TSV 613 ⁇ second transceiver circuit in second die-to-grain region 622 ⁇ second computing circuit in second computing region 621; when the processing device 503 intends to transmit data to the computing device 501, the data passes through The aforementioned reverse path reaches computing device 501 . It should be noted that some specific TSVs in the first TSVs 613 are specially designed to electrically connect the first transceiver circuit and the second transceiver circuit.
  • the memory area 631 transmits the data to other devices through the first input and output circuit. Specifically, when the data in the memory area 631 is to be transmitted to other devices off-chip, the data reaches other devices off-chip through the following path: the input-output circuit of the first input-output area 632 ⁇ the third TSV 636; When other off-chip devices want to exchange data with the computing device 501 , the data arrives at the memory area 631 through the aforementioned reverse path.
  • the memory area 631 transmits the data to other devices through the second input and output circuit. Specifically, when the data in the memory area 631 is to be transmitted to other off-chip devices, the data reaches other off-chip devices through the following path: the input-output circuit of the second input-output area 633 ⁇ the third TSV 636; When other off-chip devices want to exchange data with the processing device 503 , the data arrives at the memory area 631 through the aforementioned reverse path.
  • TSVs in the third TSVs 636 are specially designed to electrically conduct data of the first and second I/O circuits.
  • the memory area 631 transmits the data to the off-chip memory 504 through the first physical access circuit. Specifically, when the data in the memory area 631 is to be transmitted to the off-chip memory 504, the data reaches the off-chip memory 504 through the following path: the first physical access circuit of the first physical area 634 ⁇ the third TSV 636; When the external memory 504 intends to transmit input data to the memory area 631 for processing by the computing device 501 , the data arrives at the memory area 631 through the aforementioned reverse path.
  • the memory area 631 transmits the data to the off-chip memory 504 through the second physical access circuit. Specifically, when the data in the memory area 631 is to be transmitted to the off-chip memory 504, the data reaches the off-chip memory 504 through the following path: the second physical access circuit of the second physical area 635 ⁇ the third TSV 636; When the external memory 504 intends to transmit input data to the memory area 631 for processing by the processing device 503 , the data arrives at the memory area 631 through the aforementioned reverse path.
  • TSVs in the third TSVs 636 are specially designed to electrically conduct the first physical access circuit and the data of the first physical access circuit.
  • the first grain-to-grain region 612 and the second grain-to-grain region 622 are vertically stacked such that the grain-to-grain interface of the first core layer 61 is connected to the grain of the second core layer 62.
  • the interface to the die is directly electrically connected through the first TSV 613 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 7 shows a schematic diagram of vertical stacking of this embodiment.
  • This embodiment is also a multi-core chip, comprising a first core layer 71, a first memory layer 72, a second core layer 73 and a second memory layer 74, in fact the first core layer 71, the first memory layer 72, the second The two-core layer 73 and the second memory layer 74 are vertically stacked together in sequence, and the layers in FIG. 7 are visually separated up and down and are shown in this manner for convenience of illustration.
  • the first core layer 71 includes a first operation area 711, the first operation area 711 is covered with the logic layer of the first core layer 71, that is, the top side of the first core layer 71 in the figure, and the first core layer 71 also includes in a special area
  • the second core layer 73 includes a second operation area 731, and the second operation area 731 is covered with the logic layer of the second core layer 73, that is, the second core layer 73 in the figure.
  • the second core layer 73 also includes a second grain-to-grain region 732 and a second through-silicon via 733 in a special area.
  • Their functions and effects are the same as those of the foregoing embodiments, so details will not be described here.
  • the first memory layer 72 includes a first memory area 721 , a first I/O area 722 , a first physical area 723 and a third TSV 724 .
  • the first memory area 721 is formed with a storage unit for temporarily storing the operation result of the first operation circuit.
  • the first input-output area 722 is formed with a first input-output circuit, which is used as an interface for the first core layer 71 to communicate with the first memory layer 72 , that is, to realize the function of the interface device 502 .
  • the second physical area 723 has a first physical access circuit for accessing the off-chip memory 504 .
  • the third through-silicon vias 724 extend over the entire first memory layer 72 , and are only shown on one side for example, and are used to electrically connect specific components.
  • the second memory layer 74 includes a second memory area 741 , a second I/O area 742 , a second physical area 743 and a fourth TSV 744 .
  • the second memory area 741 is formed with a storage unit for temporarily storing the operation result of the second operation circuit.
  • the second input-output area 742 is formed with a second input-output circuit, which is used as an interface for the second core layer 73 to communicate with the second memory layer 74 , that is, to realize the function of the interface device 502 .
  • the second physical area 743 has a second physical access circuit for accessing the off-chip memory 504 .
  • the fourth TSV 744 spreads over the entire second memory layer 74, and is only shown on one side as an example, for electrically connecting specific elements.
  • the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. External memory 504.
  • the data reaches the processing device 503 through the following path: the first computing circuit in the first computing area 711 ⁇ the first transceiver circuit in the first die-to-die area 712 ⁇ the first Transceiver TSV of TSV 713 ⁇ Transceiver TSV of third TSV 724 ⁇ second transceiver circuit of second grain-to-grain region 732 ⁇ second computing circuit of second computing region 731; when processing When the device 503 intends to transmit data to the computing device 501, the data reaches the computing device 501 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 722 ⁇ the third silicon
  • the input and output TSV of the through hole 724 ⁇ the input and output TSV of the second TSV 733 ⁇ the input and output TSV of the fourth TSV 744; when other devices outside the chip want to transmit data to the first memory area
  • the data arrives at the first memory area 721 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the input-output circuit of the second input-output area 742 ⁇ the fourth TSV 744 input and output TSVs; when other off-chip devices want to transmit data to the second memory area 741, the data reaches the second memory area 741 through the aforementioned reverse path.
  • the data in the first memory area 721 is to be transmitted to the off-chip memory 504
  • the data reaches the off-chip memory 504 through the following path: the first physical access circuit of the first physical area 723 ⁇ the physical TSV of the third TSV 724 ⁇ the physical TSV of the second TSV 733 ⁇ the physical TSV of the fourth TSV 744; when the off-chip memory 504 intends to transmit input data to the first memory area 721 for processing by the computing device 501, the data passes through The aforementioned reverse path reaches the first memory area 721 .
  • the data in the second memory area 741 is to be transmitted to the off-chip memory 504, the data reaches the off-chip memory 504 through the following path: the second physical access circuit of the second physical area 743 ⁇ the physical TSV of the fourth TSV 744 ;
  • the off-chip memory 504 intends to transmit input data to the second memory area 741 for processing by the processing device 503, the data arrives at the second memory area 741 through the aforementioned reverse path.
  • the first core layer 71 is used in conjunction with the first memory layer 72
  • the second core layer 73 is used in conjunction with the second memory layer 74.
  • the first core layer 71 and the first memory layer 72 use The face-to-face bonding process makes the transmission path between the first computing circuit and the first memory area 721 the shortest
  • the second core layer 73 and the second memory layer 74 adopt a face-to-face bonding process, which also makes the second computing circuit and the second memory area 741
  • the transmission path is the shortest.
  • the first memory layer 72 and the second core layer 73 adopt a back-to-back bonding process.
  • the first grain-to-grain region 712 and the second grain-to-grain region 732 are vertically stacked such that the grain-to-grain interface of the first core layer 71 and the grains of the second core layer 73
  • the interface to the die is directly electrically connected to the third TSV 724 through the first TSV 713 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 8 shows a schematic diagram of vertical stacking in this embodiment.
  • the multi-core chip of this embodiment includes a first core layer 81, a first memory layer 82, a second core layer 83, a second memory layer 84, a third memory layer 85, and a fourth memory layer 86.
  • the multi-core chip of this example is divided into a first die group and a second die group, the first die group is stacked on the second die group, and the first die group is respectively the third memory layer 85, the A core layer 81 and a first memory layer 82, the second die group from top to bottom are the fourth memory layer 86, the second core layer 83 and the second memory layer 84, that is, the fourth memory layer 86 is located in the first memory layer layer 82 and the second core layer 83 .
  • the layers in FIG. 8 are visually separated up and down and shown in this way for convenience of illustration only.
  • first core layer 81, the first memory layer 82, the second core layer 83, and the second memory layer 84 are the same as the first core layer 71, the first memory layer 72, and the second core layer 73 in the foregoing embodiments.
  • second memory layer 74 are the same, so details are not repeated here.
  • the third memory layer 85 includes a third memory area 851 and a fifth TSV 852 , the third memory area 851 covers the logic layer of the third memory layer 85 , that is, the top side of the third memory layer 85 in the figure.
  • the third memory area 851 is formed with storage units for temporarily storing the calculation results of the first calculation circuit.
  • the fifth through-silicon vias 852 are spread over the entire third memory layer 85 and are only shown on one side for electrical connection. components.
  • the third memory layer 85 is only responsible for temporarily storing the calculation results of the first calculation circuit, and is not responsible for the external contact task of the first die group.
  • the first computing circuit can use the temporary storage space of the first memory area 821 and the third memory area 851, and when the computing device 501 wants to temporarily store intermediate data, it can temporarily store it to the third memory area 851 through the fifth TSV 852, Or it is temporarily stored in the first memory area 821 through the first TSV 813 .
  • the fourth memory layer 86 includes a fourth memory area 861 and sixth TSVs 862 .
  • the fourth memory area 861 covers the logical layer of the fourth memory layer 86 , ie the top side of the fourth memory layer 86 in the figure.
  • the fourth memory area 861 has storage units for temporarily storing the operation results of the second operation circuit.
  • the sixth through-silicon vias 862 are spread over the entire fourth memory layer 86 and are only shown on one side for electrical connection. components.
  • the fourth memory layer 86 is only responsible for temporarily storing the calculation results of the second calculation circuit, and is not responsible for the external contact task of the second die group.
  • the second arithmetic circuit can use the temporary storage space of the second memory area 841 and the fourth memory area 861, and when the processing device 503 wants to temporarily store intermediate data, it can temporarily store it to the fourth memory area 861 through the sixth TSV 862, Or it is temporarily stored in the second memory area 841 through the second TSV 833 .
  • the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. External memory 504.
  • the data reaches the processing device 503 through the following path: the first computing circuit in the first computing area 811 ⁇ the first transceiver circuit in the first die-to-die area 812 ⁇ the first Transceiver TSV of TSV 813 ⁇ Transceiver TSV of third TSV 824 ⁇ Transceiver TSV of sixth TSV 862 ⁇ Second transceiver circuit of second die-to-grain region 832 ⁇ No.
  • the second computing circuit of the second computing area 831 when the processing device 503 intends to transmit data to the computing device 501 , the data reaches the computing device 501 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 822 ⁇ the second The input-output TSV of the third TSV 824 ⁇ the input-output TSV of the sixth TSV 862 ⁇ the input-output TSV of the second TSV 833 ⁇ the input-output TSV of the fourth TSV 844 ;
  • the data arrives at the first memory area 821 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the second input-output circuit of the second input-output area 842 ⁇ the second The input and output TSVs of the four TSVs 844 ; when other off-chip devices want to transmit data to the second die group, the data arrives at the second memory area 841 through the aforementioned reverse path.
  • the data When the data of the first die group is to be transmitted to the off-chip memory 504, the data reaches the off-chip memory 504 through the following path: the first physical access circuit of the first physical area 823 ⁇ the physical TSV of the third TSV 824 ⁇ the physical TSV of the sixth TSV 862 ⁇ the physical TSV of the second TSV 833 ⁇ the physical TSV of the fourth TSV 844;
  • the data arrives at the first memory area 821 through the aforementioned reverse path.
  • the data When the data of the second die group is to be transmitted to the off-chip memory 504, the data reaches the off-chip memory 504 through the following path: the second physical access circuit of the second physical area 843 ⁇ the physical TSV of the fourth TSV 844 ;
  • the off-chip memory 504 intends to transmit input data to the second die group for processing by the processing device 503 , the data arrives at the second memory area 841 through the aforementioned reverse path.
  • the first core layer 81 is used in conjunction with the first memory layer 82 and the third memory layer 85
  • the second core layer 83 is used in conjunction with the second memory layer 84 and the fourth memory layer 86.
  • the first core layer 81 and the first memory layer 82 adopt a face-to-face bonding process, so that the transmission path between the first computing circuit and the first memory area 821 is the shortest
  • the first core layer 81 and the third memory layer 85 adopt face-to-back bonding Manufacturing process
  • the first memory layer 82 and the fourth memory layer 86 adopt a back-to-back bonding process
  • the second core layer 83 and the fourth memory layer 86 adopt a face-to-face bonding process, which also makes the transmission between the second computing circuit and the fourth memory area 861
  • the path is the shortest
  • the second core layer 83 and the second memory layer 84 adopt a face-to-back bonding process.
  • the first grain-to-grain region 812 and the second grain-to-grain region 832 are vertically stacked such that the grain-to-grain interface of the first core layer 81 is connected to the grain of the second core layer 83.
  • the interface to the die is directly electrically connected to the sixth TSV 862 through the first TSV 813 , the third TSV 824 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 9 shows a schematic diagram of vertical stacking in this embodiment.
  • the multi-core chip of this embodiment is stacked from top to bottom and divided into a first die group, a second die group and a third die group.
  • the first die group is respectively the first core layer 91 and the first memory layer 92 from top to bottom
  • the second die group is respectively the second core layer 93 and the second memory layer 94 from top to bottom
  • the third die only includes the third memory layer 95 , so the third memory layer 95 is located under the second memory layer 94 .
  • the layers in FIG. 9 are visually separated up and down and shown in this way for convenience of illustration only.
  • the first core layer 91 includes a first operation area 911, the first operation area 911 is covered with the logic layer of the first core layer 91, that is, the top side of the first core layer 91 in the figure, and the first core layer 91 also includes in a special area
  • the first memory area 921 has storage units for temporarily storing the calculation results of the first calculation circuit.
  • the second core layer 93 includes a second operation area 931, the second operation area 931 is full of the logic layer of the second core layer 93, that is, the top side of the second core layer 93 in the figure, and the second core layer 93 also includes in a special area
  • the third memory layer 95 includes a third memory area 951, a first input-output area 952, a second input-output area 953, a first physical access area 954, a second physical access area 955, and a fifth TSV 956.
  • the third memory The area 951 is formed with a storage unit for temporarily storing the calculation results of the first operation circuit or the second operation circuit
  • the first input-output area 952 is formed with a first input-output circuit, which is used as an interface for the first die group to communicate with the outside world , that is to realize the function of the interface device 502
  • the second input and output area 953 generates a second input and output circuit, which is used as an interface for the second die group to communicate with the outside world, that is, realizes the function of the interface device 502
  • the first physical area 954 generates There is a first physical access circuit for connecting the first die group and the off-chip memory 504 , and a second physical access circuit is formed in the second physical area 955 for connecting the second die group and
  • the TSVs are present throughout the entire layer, only shown on one side by way of example. If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. External memory 504.
  • the data reaches the processing device 503 through the following path: the first computing circuit in the first computing area 911 ⁇ the first transceiver circuit in the first die-to-die area 912 ⁇ the first Transceiver TSV of TSV 913 ⁇ Transceiver TSV of second TSV 922 ⁇ second transceiver circuit of second grain-to-grain region 932 ⁇ second computing circuit of second computing region 931; when processing When the device 503 intends to transmit data to the computing device 501, the data reaches the computing device 501 through the aforementioned reverse path.
  • the first die group and the second die group are not directly connected to the off-chip, and when they need to be connected to the off-chip, this embodiment is implemented through the third memory layer 95 of the third die group.
  • the data When the calculation result of the computing device 501 needs to exchange data with other off-chip devices through the interface device 502, the data will be transmitted to the third memory area 951 for temporary storage through the input and output through-silicon vias of each layer, and then the third memory area 951 reaches other off-chip devices through the following paths: the first I/O circuit of the first I/O region 952 ⁇ the first I/O TSV of the fifth TSV 956; when other off-chip devices want to transmit data to the In the case of a die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the first memory area 921 .
  • the data will be transmitted to the third memory area 951 for temporary storage through the input and output through-silicon vias of each layer, and then the third memory area 951 reaches other devices off-chip through the following path: the second input-output circuit of the second input-output area 953 ⁇ the second input-output silicon via of the fifth silicon via 956; when other devices outside the chip want to transmit data to the second In the case of a two-die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the second memory area 941 .
  • the data in the first memory area 921 When the data in the first memory area 921 is to be transmitted to the off-chip memory 504, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSVs of each layer, and then the third memory area 951 will reach the off-chip through the following path Other devices: the first physical access circuit of the first physical area 954 ⁇ the first physical TSV of the fifth TSV 956; when the off-chip memory 504 intends to transmit input data to the first die group, the input data passes through The aforementioned reverse path is temporarily stored in the third memory area 951 , and then transmitted from the third memory area 951 to the first memory area 921 .
  • the data in the second memory area 941 When the data in the second memory area 941 is to be transmitted to the off-chip memory 504, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSV of the fourth TSV, and then passed by the third memory area 951 through the following path Reaching other off-chip devices: the second physical access circuit of the second physical area 955 ⁇ the second physical TSV of the fifth TSV 956; when the off-chip memory 504 intends to transmit input data to the second die group, The input data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then is transmitted from the third memory area 951 to the second memory area 941 through the physical TSV of the fourth TSV.
  • the first core layer 91 is used in conjunction with the first memory layer 92
  • the second core layer 93 is used in conjunction with the second memory layer 94.
  • the first core layer 91 and the first memory layer 92 use The face-to-face bonding process makes the transmission path between the first computing circuit and the first memory area 921 the shortest, and the second core layer 93 and the second memory layer 94 adopt a face-to-face bonding process, which also makes the second computing circuit and the second memory area 941
  • the transmission path is the shortest.
  • the first memory layer 92 and the second core layer 93 adopt a back-to-back bonding process
  • the second memory layer 94 and the third memory layer 95 adopt a face-to-back bonding process.
  • the first grain-to-grain region 912 and the second grain-to-grain region 932 are vertically stacked such that the grain-to-grain interface of the first core layer 91 is connected to the grain of the second core layer 93.
  • the interface to the die is directly electrically connected to the second TSV 922 through the first TSV 913 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 10 shows a schematic diagram of vertical stacking of this embodiment.
  • the multi-core chip of this embodiment is stacked from top to bottom and divided into a first die group, a second die group and a third die group.
  • the first die group is respectively the third memory layer B and the first core layer A from top to bottom
  • the second die group is respectively the first memory layer D and the second core layer C from top to bottom
  • the third die includes the second memory tier E only.
  • the only difference between the vertical stacking structure of this embodiment and the embodiment in FIG. 9 is that the positions of the core layer and the memory layer of the first die group and the second die group are swapped.
  • FCBGA flip chip ball grid array
  • CoWoS chip on wafer on substrate packaging technology
  • FCBGA flip chip ball grid array
  • Small balls are used instead of pins to connect circuits, which can provide the shortest external connection distance.
  • CoWoS is an integrated production technology. First, the die is connected to the silicon wafer through the CoW packaging process, and then the CoW die is connected to the substrate to form CoWoS. Through this technology, multiple die can be packaged together. The technical effect of small package volume, low power consumption and few pins is achieved.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 4 , and its flow chart is shown in FIG. 11 .
  • a first nuclear layer 41 is generated, the first nuclear layer includes a first computing region 411 and a first grain-to-grain region 412, wherein the first computing region 411 is generated with a first computing circuit, and the first crystal grain A first transceiver circuit is formed in the die region 412 .
  • a transceiver TSV is formed in the first core layer 41 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • the second core layer 42 is generated, and the second core layer includes a second computing area 421 and a second grain-to-grain area 422, wherein the second computing area 421 is generated with a second computing circuit, and the second crystal grain A second transceiver circuit is formed for the die region 422 .
  • the first core layer 41 and the second core layer 42 are vertically stacked, and the first operation circuit and the second operation circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the first die-to-grain region 412 and the second die-to-grain region 422 are vertically stacked such that the die-to-grain interface of the first core layer 41 is connected to the grains of the second core layer 42
  • the interface to the die is directly electrically connected through the first TSV 413 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 6 , and its flow chart is shown in FIG. 12 .
  • a first nuclear layer 61 is generated, and the first nuclear layer 61 includes a first computing region 611 and a first grain-to-grain region 612, wherein the first computing region 611 is generated with a first computing circuit, and the first crystal The die-to-die region 612 is formed with a first transceiver circuit.
  • a transceiver TSV is formed in the first core layer 61 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a memory layer 63 is generated, and a memory area 631 , an input/output area 632 , a first physical area 634 and TSVs 624 are generated in the memory layer 63 .
  • the memory area 631 generates a storage unit for temporarily storing the operation results of the first computing circuit and the second computing circuit; the input and output area 632 generates an input and output circuit, which is used as an interface for the multi-core chip to communicate with the outside world; the first physical area 634 A physical access circuit is generated for accessing the off-chip memory 504 .
  • the TSV 624 is used to electrically connect the first transceiver circuit and the second transceiver circuit.
  • the transceiver TSVs are formed in the memory layer 63 to electrically connect the first transceiver circuit and the second transceiver circuit. Specifically, part of the TSVs 624 are configured as transceiver TSVs.
  • a second core layer 62 is generated, and the second core layer 62 includes a second computing area 621 and a second grain-to-grain area 622, wherein the second computing area 621 is formed with a second computing circuit, and the second die The die-to-die region 622 is formed with a second transceiver circuit.
  • the first core layer 61 , the memory layer 63 and the second core layer 62 are stacked sequentially, that is, the memory layer 63 is formed between the first core layer 61 and the second core layer 62 .
  • the first die-to-grain region 612 and the second die-to-grain region 622 are vertically stacked such that the die-to-grain interface of the first core layer 61 directly passes through the grain-to-grain interface of the second core layer 62.
  • the first TSV 613 is electrically connected to the third TSV 636 without using the interposer 201 as shown in FIG. 2 for transmission.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 7 , and its flow chart is shown in FIG. 13 .
  • the first nuclear layer 71 is generated, and the first nuclear layer 71 includes a first computing region 711 and a first grain-to-grain region 712, wherein the first computing region 711 is generated with a first computing circuit, and the first crystal The die-to-die region 712 is formed with a first transceiver circuit.
  • a transceiver TSV is formed in the first core layer 71 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a first memory layer 72 is generated.
  • the first memory layer 72 includes a first memory area 721 and a storage unit is generated for temporarily storing the operation result of the first operation circuit.
  • a transceiver TSV is formed in the first memory layer 72 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a second core layer 73 is generated, and the second core layer 73 includes a second computing region 731 and a second grain-to-grain region 732, wherein the second computing region 731 is generated with a second computing circuit, and the second crystal The die-to-die region 732 is formed with a second transceiver circuit.
  • a second memory layer 74 is generated.
  • the second memory layer 74 includes a second memory area 741 and a storage unit is generated for temporarily storing the operation result of the second operation circuit.
  • a transceiver TSV is formed in the second memory layer 74 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • the first core layer 71, the first memory layer 72, the second core layer 73, and the second memory layer 74 are stacked sequentially, more specifically, the first die-to-die region 712 and the second
  • the grain-to-grain regions 732 are vertically stacked, so that the grain-to-grain interface of the first core layer 71 and the grain-to-grain interface of the second core layer 73 are directly electrically connected through the transceiver through-silicon via, without using such as The intermediary layer 201 shown in FIG. 2 performs transmission.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 8 , and its flow chart is shown in FIG. 14 .
  • a third memory layer 85 is generated.
  • the third memory layer 85 includes a third memory area 851 and a storage unit is generated for temporarily storing the operation result of the first operation circuit.
  • the first core layer 81 is generated, and the first core layer 81 includes a first computing region 811 and a first grain-to-grain region 812, wherein the first computing region 811 is generated with a first computing circuit, and the first die The die-to-die region 812 is formed with a first transceiver circuit.
  • a transceiver TSV is formed in the first core layer 81 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • the first memory layer 82 is generated, the first memory layer 82 includes the first memory area 821, and a storage unit is generated for temporarily storing the operation result of the first operation circuit.
  • a transceiver TSV is formed in the first memory layer 82 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a fourth memory layer 86 is generated, the fourth memory layer 86 includes a fourth memory area 861, and a storage unit is generated to temporarily store the operation result of the second operation circuit, wherein the fourth memory layer 86 is located in the first Between the memory layer 82 and the second core layer 83 .
  • a transceiver TSV is formed in the fourth memory layer 86 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a second core layer 83 is generated, and the second core layer 83 includes a second computing region 831 and a second grain-to-grain region 832, wherein the second computing region 831 is formed with a second computing circuit, and the second crystal The die-to-die region 832 is formed with a second transceiver circuit.
  • step 1406 the second memory layer 84 is generated, the second memory layer 84 includes the second memory area 841, and a storage unit is generated to temporarily store the operation result of the second operation circuit.
  • a transceiver TSV is formed in the second memory layer 84 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • the third memory layer 85, the first core layer 81, the first memory layer 82, the fourth memory layer 86, the second core layer 83, and the second memory layer 84 are stacked in sequence. More specifically, The first die-to-grain region 812 and the second die-to-grain region 832 are vertically stacked such that the die-to-grain interface of the first core layer 81 directly passes through the die-to-grain interface of the second core layer 83.
  • the transmission and reception of TSV electrical connections does not require the use of the interposer 201 as shown in FIG. 2 for transmission.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 9 , and its flow chart is shown in FIG. 15 .
  • the first nuclear layer 91 is generated, and the first nuclear layer 91 includes a first computing region 911 and a first grain-to-grain region 912, wherein the first computing region 911 is generated with a first computing circuit, and the first crystal
  • the die-to-die region 912 is formed with a first transceiver circuit.
  • a transceiver TSV is formed in the first core layer 91 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a first memory layer 92 is generated.
  • the first memory layer 92 includes a first memory area 921 and a storage unit is generated for temporarily storing the operation result of the first operation circuit.
  • a transceiver TSV is formed in the first memory layer 92 for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a second nuclear layer 93 is generated, and the second nuclear layer 93 includes a second computing region 931 and a second grain-to-grain region 932, wherein the second computing region 931 is generated with a second computing circuit, and the second crystal The die-to-die region 932 is formed with a second transceiver circuit.
  • a second memory layer 94 is generated.
  • the second memory layer 94 includes a second memory area 941 and a storage unit is generated to temporarily store the operation result of the second operation circuit.
  • a third memory layer 95 is generated, the third memory layer 95 includes a third memory area 951, and a storage unit is generated to temporarily store the operation results of the first operation circuit or the second operation circuit, wherein the third memory Layer 95 is located below second memory layer 94 .
  • the first core layer 91, the first memory layer 92, the second core layer 93, the second memory layer 94 and the third memory layer 95 are stacked in sequence, more specifically, the first die-to-die
  • the grain area 912 and the second grain-to-grain area 932 are vertically stacked, so that the grain-to-grain interface of the first core layer 91 and the grain-to-grain interface of the second core layer 93 are directly electrically connected through the receiving and receiving TSVs.
  • the connection does not need to use the intermediary layer 201 shown in FIG. 2 for transmission.
  • Another embodiment of the present invention is a method for manufacturing a multi-core chip as shown in FIG. 10 , and its flow chart is shown in FIG. 16 .
  • a third memory layer B is generated.
  • the third memory layer B includes a third memory area 1021, and a storage unit is generated for temporarily storing the operation result of the first operation circuit.
  • a first nuclear layer A is generated, and the first nuclear layer A includes a first computing region 1011 and a first grain-to-grain region 1012, wherein the first computing region 1011 is generated with a first computing circuit, and the first crystal
  • the die-to-die region 1012 is formed with a first transceiver circuit.
  • a transceiver TSV is formed in the first core layer A for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a first memory layer D is generated.
  • the first memory layer D includes a first memory area 1041 and a storage unit is generated for temporarily storing the operation result of the second operation circuit.
  • a transceiver TSV is formed in the first memory layer D for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • a second core layer C is generated, and the second core layer C includes a second computing area 1031 and a second grain-to-grain area 1032, wherein the second computing area 1031 is formed with a second computing circuit, and the second die The die-to-die region 1032 is formed with a second transceiver circuit.
  • a second memory layer E is generated.
  • the second memory layer E includes a second memory area 1051 and a storage unit is generated for temporarily storing the operation result of the first operation circuit or the second operation circuit.
  • the third memory layer B, the first core layer A, the first memory layer D, the second core layer C, and the second memory layer E are stacked in sequence, more specifically, the first die-to-die
  • the grain region 1012 and the second die-to-grain region 1032 are vertically stacked, so that the die-to-die interface of the first core layer A and the die-to-die interface of the second core layer C are directly connected through the receiving and receiving TSVs.
  • the connection does not need to use the intermediary layer 201 shown in FIG. 2 for transmission.
  • the solution of the present invention is to stack the core layer vertically so that the grain-to-grain area of the core layer is also vertically stacked, and the interface between the two grains does not need to pass through the interposer but uses through-silicon holes for data transmission.
  • the transmission path of the die-to-die interface is greatly shortened, which helps to improve the transmission efficiency between cores.
  • the electronic equipment or device of the present invention may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, PC equipment, Internet of Things terminals, mobile Terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, earphones, mobile storage, wearable devices, visual terminals, automatic driving terminals, vehicles, household appliances, and/or medical equipment.
  • Said vehicles include airplanes, ships and/or vehicles;
  • said household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods;
  • said medical equipment includes nuclear magnetic resonance instruments, Ultrasound and/or electrocardiograph.
  • the electronic equipment or device of the present invention can also be applied to fields such as the Internet, the Internet of Things, data centers, energy, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and medical treatment. Further, the electronic device or device of the present invention can also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, and terminal.
  • electronic devices or devices with high computing power according to the solution of the present invention can be applied to cloud devices (such as cloud servers), while electronic devices or devices with low power consumption can be applied to terminal devices and/or Edge devices (such as smartphones or cameras).
  • the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, the hardware resources of the cloud device can be Match appropriate hardware resources to simulate the hardware resources of terminal devices and/or edge devices, so as to complete the unified management, scheduling and collaborative work of device-cloud integration or cloud-edge-end integration.
  • the present invention expresses some methods and their embodiments as a series of actions and combinations thereof, but those skilled in the art can understand that the solution of the present invention is not limited by the order of the described actions . Therefore, according to the disclosure or teaching of the present invention, those skilled in the art can understand that some of the steps can be performed in other order or at the same time. Further, those skilled in the art can understand that the embodiments described in the present invention can be regarded as optional embodiments, that is, the actions or modules involved therein are not necessarily necessary for the realization of one or some solutions of the present invention. In addition, according to different schemes, the description of some embodiments of the present invention also has different emphases. In view of this, those skilled in the art may understand the parts not described in detail in a certain embodiment of the present invention, and may also refer to relevant descriptions of other embodiments.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit.
  • the aforementioned components or units may be located at the same location or distributed over multiple network units.
  • some or all of the units may be selected to achieve the purpose of the solutions described in the embodiments of the present invention.
  • multiple units in this embodiment of the present invention may be integrated into one unit, or each unit exists physically independently.
  • the above-mentioned integrated units may also be implemented in the form of hardware, that is, specific hardware circuits, which may include digital circuits and/or analog circuits.
  • the physical realization of the hardware structure of the circuit may include but not limited to physical devices, and the physical devices may include but not limited to devices such as transistors or memristors.
  • various devices such as computing devices or other processing devices described herein may be implemented by appropriate hardware processors, such as central processing units, GPUs, FPGAs, DSPs, and ASICs.
  • the aforementioned storage unit or storage device can be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which can be, for example, a variable resistance memory (Resistive Random Access Memory, RRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), Static Random Access Memory (Static Random Access Memory, SRAM), Enhanced Dynamic Random Access Memory (Enhanced Dynamic Random Access Memory, EDRAM), High Bandwidth Memory (High Bandwidth Memory , HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM and RAM, etc.
  • RRAM variable resistance memory
  • DRAM Dynamic Random Access Memory
  • SRAM Static Random Access Memory
  • EDRAM Enhanced Dynamic Random Access Memory
  • High Bandwidth Memory High Bandwidth Memory
  • HBM High Bandwidth Memory
  • HMC Hybrid Memory Cube
  • ROM and RAM etc.
  • a multi-core chip comprising: a first core layer, including: a first operation area, in which a first operation circuit is generated; and a first die-to-die area, in which a first transceiver circuit is generated; a second core layer , comprising: a second operation area, where a second operation circuit is generated; and a second die-to-die area, where a second transceiver circuit is generated; wherein, the first core layer and the second core layer are vertically stacked, The first computing circuit and the second computing circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • the multi-core chip according to Clause A1 connected to the off-chip memory, further comprising a memory layer, the memory layer comprising: a memory area, a storage unit is generated for temporarily storing the first computing circuit and the The calculation result of the second calculation circuit; the input and output area, which is formed with input and output circuits, used as an interface for external communication of the multi-core chip; and the physical area, which is formed with a physical access circuit for accessing the off-chip memory.
  • Clause A3 The multi-core chip according to Clause A2, wherein the memory layer is located between the first core layer and the second core layer, and the memory layer is formed with through-silicon vias for electrically connecting the The first transceiver circuit and the second transceiver circuit.
  • Clause A4 The multi-core chip of Clause A2, wherein the memory area is located between the first core layer and the second core layer, and the second core layer is formed with through-silicon vias for electrical conduction The input and output circuit data.
  • Clause A5 The multi-core chip of Clause A2, wherein the memory area is located between the first core layer and the second core layer, and the second core layer is formed with through-silicon vias for electrical conduction The physical access circuit data.
  • Clause A7 The multi-core chip according to Clause A6, wherein the first memory layer further includes a first input-output area, and a first input-output circuit is generated to serve as an interface for the multi-core chip to communicate externally, and the first The second core layer and the second memory layer are formed with I/O TSVs for electrically conducting data of the first I/O circuit.
  • Clause A8 The multi-core chip according to Clause A6, wherein the second memory layer further includes a second input-output area, a second input-output circuit is generated, and is electrically connected to the outside of the multi-core chip through an input-output silicon via. .
  • Clause A9 The multi-core chip of Clause A6, connected to off-chip memory, wherein said first memory layer further comprises a first physical area, generating physical access circuits, said second core layer and said second memory Physical TSVs are formed in the layer for electrically conducting the operation result of the first operation circuit to the off-chip memory.
  • Clause A10 The multi-core chip according to Clause A6, connected to off-chip memory, wherein the second memory layer further includes a second physical area, a physical access circuit is generated, and the second computing circuit is connected to the second computing circuit through a physical silicon via. The result of the operation is transferred to the off-chip memory.
  • Clause A11 The multi-core chip of Clause A6, wherein the first core layer and the first memory layer are face-to-face processes, the first memory layer and the second core layer are back-to-back processes, and the first The two-core layer and the second memory layer are manufactured face-to-face.
  • Clause A12 The multi-core chip according to Clause A6, further comprising a third memory layer, the third memory layer comprising a third memory area, generating a storage unit for temporarily storing the calculation result of the first calculation circuit, Wherein the third memory layer is located above the first core layer.
  • Clause A13 The multi-core chip of Clause A12, wherein the third memory layer and the first core layer are face-to-face or face-to-back processes.
  • Clause A14 The multi-core chip according to Clause A6, further comprising a fourth memory layer, the fourth memory layer comprising a fourth memory area, generating a storage unit for temporarily storing the calculation result of the second calculation circuit, Wherein the fourth memory layer is located between the first memory layer and the second core layer, and the fourth memory layer is formed with transceiver through-silicon vias for electrically connecting the first transceiver circuit and the the second transceiver circuit.
  • Clause A15 The multi-core chip according to clause A14, wherein the first memory layer further includes a first input-output area, and a first input-output circuit is generated to serve as an interface for the multi-core chip to communicate externally, and the first The four memory layers, the second core layer and the second memory layer are formed with I/O TSVs for electrically conducting data of the first I/O circuit.
  • Clause A16 The multi-core chip of Clause A14, connected to off-chip memory, wherein said first memory layer further comprises a first physical area generating physical access circuits, said fourth memory layer, said second core Physical through-silicon vias are formed on the layer and the second memory layer for electrically conducting the operation result of the first operation circuit to the off-chip memory.
  • Clause A17 The multi-core chip of clause A14, wherein the first core layer and the first memory layer are face-to-face processes, the first memory layer and the fourth memory layer are back-to-back processes, and the first The four memory layers and the second core layer are of face-to-face process, and the second core layer and the second memory layer are of face-to-back process.
  • Clause A18 The multi-core chip according to Clause A6, further comprising a third memory layer, including a third memory area, generating a storage unit for temporarily storing the calculation results of the first computing circuit or the second computing circuit , wherein the third memory layer is located below the second memory layer.
  • Clause A19 The multi-core chip according to Clause A18, wherein the third memory layer further includes an input-output area, in which an input-output circuit is generated to serve as an interface of the multi-core chip for external communication.
  • Clause A20 The multi-core chip of Clause A18, connected to off-chip memory, wherein said third memory layer further comprises a physical area, generating a physical access circuit for electrically conducting said first arithmetic circuit and said The operation result of the second operation circuit is sent to the off-chip memory.
  • Clause A21 The multi-core chip of Clause A18, wherein the first core layer and the first memory layer are face-to-face processes, the first memory layer and the second core layer are back-to-back processes, and the first The second core layer and the second memory layer are of face-to-face process, and the second memory layer and the third memory layer are of face-to-back process.
  • Clause A22 The multi-core chip of any one of clauses A1 to 21, wherein the layers are packaged in a flip-chip ball grid array.
  • Clause A25 A board comprising the integrated circuit device of Clause A24.
  • a method of making a multi-core chip comprising: generating a first core layer comprising: a first computing region having a first computing circuit generated; and a first die-to-die region, A first transceiver circuit is generated; a second core layer is generated, and the second core layer includes: a second operation area, where a second operation circuit is generated; and a second die-to-die area, where a second transceiver circuit is generated; Wherein, the first core layer and the second core layer are vertically stacked, and the first computing circuit and the second computing circuit perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit .
  • Clause A27 The method of Clause A26, the multi-core chip connected to off-chip memory, the method further comprising generating a memory layer between the first core layer and the second core layer, the memory layer comprising : a memory area, generating a storage unit for temporarily storing the operation results of the first computing circuit and the second computing circuit; an input and output area, generating an input and output circuit for external communication of the multi-core chip an interface; and a physical area, generating a physical access circuit for accessing the off-chip memory.
  • Clause A28 The method according to Clause A27, wherein the step of generating a memory layer includes forming a through-silicon via in the memory layer for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • Clause A29 The method according to Clause A26, further comprising: generating a first memory layer, including a first memory area, generating storage units for temporarily storing calculation results of the first calculation circuit; and generating a second memory layer layer, including a second memory area, generating a storage unit for temporarily storing the operation result of the second operation circuit; wherein, the first core layer, the first memory layer, the second core layer, The second memory layer is stacked sequentially; wherein the step of generating the first memory layer includes generating a transceiver through-silicon via in the first memory layer to electrically connect the first transceiver circuit and the second memory layer. transceiver circuit.
  • Clause A30 The method according to Clause A29, further comprising generating a third memory layer, the third memory layer including a third memory area, generating a storage unit for temporarily storing the calculation result of the first calculation circuit, Wherein the third memory layer is located above the first core layer.
  • Clause A31 The method according to Clause A30, further comprising generating a fourth memory layer, the fourth memory layer including a fourth memory area, generating a storage unit for temporarily storing the calculation result of the second calculation circuit, Wherein the fourth memory layer is located between the first memory layer and the second core layer, and the step of generating the fourth memory layer includes generating transceiver through-silicon vias in the fourth memory layer for electrical Connecting the first transceiver circuit and the second transceiver circuit.
  • Clause A32 The method according to Clause A29, further comprising generating a third memory layer, including a third memory area, generating a storage unit for temporarily storing the calculation result of the first computing circuit or the second computing circuit , wherein the third memory layer is located below the second memory layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Manufacturing & Machinery (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Credit Cards Or The Like (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

一种多核芯片、集成电路装置、板卡及其制程方法,其中计算装置(501)包括在集成电路装置中,该集成电路装置包括通用互联接口和其他处理装置。计算装置(501)与其他处理装置进行交互,共同完成用户指定的计算操作。集成电路装置还可以包括存储装置,存储装置分别与计算装置(501)和其他处理装置连接,用于计算装置(501)和其他处理装置的数据存储。

Description

多核芯片、集成电路装置、板卡及其制程方法
相关申请的交叉引用
本申请要求于2021年10月08日申请的,申请号为202111172907.2,名称为“多核芯片、集成电路装置、板卡及其制程方法”的中国专利申请的优先权。
技术领域
本发明一般地涉及半导体领域。更具体地,本发明涉及多核芯片、集成电路装置、板卡及其制程方法。
背景技术
自从大数据时代来临,结合人工智能技术的系统级芯片需要应对越来越复杂环境,迫使系统级芯片开发出更多的功能,目前芯片设计已逼近最大光罩尺寸。因此,开发人员试着将系统级芯片划分为多芯片模块,模块与模块间需要以超短(ultra-short)和极短(extra-short)距离连结,以实现晶粒(die)间的高速数据传递。除了尽量扩展带宽外,晶粒对晶粒(die-to-die,D2D)的连接更是一种极低延迟和极低功耗的解决方案。
晶粒对晶粒接口是一个功能块,会占据晶粒一小片面积,用以提供装配在同一封装中的两个模块或两晶粒间的数据接口。晶粒对晶粒接口利用非常短的通道连接封装内的模块或晶粒,其传输速率和带宽超过传统芯片对芯片接口。
在现有技术中,两个用晶粒对晶粒接口相连的模块或晶粒通常会并排摆放,且两个模块或晶粒的晶粒对晶粒接口相邻,两个晶粒对晶粒接口通过下方的中介层(interposer layer)实现电性连接。虽然晶粒对晶粒接口的传输速率和带宽表现优异,但经由下方的中介层传输数据时,其传输路径高达毫米级。传输路径太长会造成讯号的衰减和速率的降低,仍无法满足高强度运算所需的要求。
因此,一种发挥晶粒对晶粒接口优势的技术方案是迫切需要的。
发明内容
为了至少部分地解决背景技术中提到的技术问题,本发明的方案提供了一种多核芯片、集成电路装置、板卡及其制程方法。
在一个方面中,本发明揭露一种多核芯片,包括第一核层及第二核层。第一核层包括:第一运算区,生成有第一运算电路;以及第一晶粒对晶粒区,生成有第一收发电路。第二核层包括:第二运算区,生成有第二运算电路;以及第二晶粒对晶粒区,生成有第二收发电路。第一核层和第二核层纵向堆叠,第一运算电路及第二运算电路通过第一收发电路及第二收发电路进行层间数据传输。
在另一个方面,本发明揭露一种集成电路装置,包括前述的多核芯片;还揭露一种板卡,包括前述的集成电路装置。
在另一个方面,本发明揭露一种制成多核芯片的方法,包括:生成第一核层,第一核层包括第一运算区,生成有第一运算电路,以及第一晶粒对晶粒区,生成有第一收发电路;生成第二核层,第二核层包括第二运算区,生成有第二运算电路,以及第二晶粒对晶粒区,生成有第二收发电路。第一核层和第二核层纵向堆叠,第一运算电路及第二运算电路通过第一收发电路及第二收发电路进行层间数据传输。
本发明的多核芯片通过晶粒对晶粒区的纵向堆叠,使得两晶粒对晶粒接口无需通过中介层进行数据传输,两晶粒对晶粒接口的传输路径大大缩短了,有助于提高核间的传输效率。
附图说明
通过参考附图阅读下文的详细描述,本发明示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本发明的若干实施方式,并且相同或对应的标号表示相同或对应的部分。其中:
图1示出一种包括晶粒对晶粒接口的封装结构的布局俯视图;
图2示出图1的封装结构沿着虚线方向的剖面图;
图3是示出本发明实施例的板卡的结构图;
图4示出本发明实施例的芯片的示意图;
图5是示出本发明实施例的集成电路装置的结构图;
图6是示出本发明另一个实施例纵向堆叠的示意图;
图7是示出本发明另一个实施例纵向堆叠的示意图;
图8是示出本发明另一个实施例纵向堆叠的示意图;
图9是示出本发明另一个实施例纵向堆叠的示意图;
图10是示出本发明另一个实施例纵向堆叠的示意图;
图11是示出本发明另一个实施例制成图4的多核芯片的流程图;
图12是示出本发明另一个实施例制成图6的多核芯片的流程图;
图13是示出本发明另一个实施例制成图7的多核芯片的流程图;
图14是示出本发明另一个实施例制成图8的多核芯片的流程图;
图15是示出本发明另一个实施例制成图9的多核芯片的流程图;以及
图16是示出本发明另一个实施例制成图10的多核芯片的流程图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
应当理解,本发明的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本发明的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本发明。如在本发明说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本发明说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。
下面结合附图来详细描述本发明的具体实施方式。
晶粒对晶粒接口就如同任何其他芯片对芯片接口一样,在两个晶粒间建立的数据链接渠道。晶粒对晶粒接口逻辑上分为物理层、链路层和事务层,并提供一种标准化的平行接口,连接到内部互连结构。
图1示出一种包括晶粒对晶粒接口的封装结构的布局俯视图,此封装结构的布局是位于晶片的模塑料(molding compound)区10,模塑料区10包括系统区域及存储区域,此 示例性的系统区域位于模塑料区10的中央,用以放置2个片上系统101,存储区域分别位于系统区域的两侧,用以放置8个片外内存102。
系统区域还设有晶粒对晶粒区103、物理区104及输入输出区105。晶粒对晶粒区103生成有收发电路,用以在两个片上系统101间进行数据分享;物理区104生成有物理访问电路,用以访问片外内存102;输入输出区105生成有输入输出电路,用以作为片上系统101对外联系的接口。
系统区域还放置了内存106,作为片上系统101的暂存空间,其容量小于片外内存102,但数据传输速率却高于片外内存102。
图2示出图1的封装结构沿着虚线方向的剖面图。如图所示,系统区域分为上下2层,上层为片上系统101,下层为晶粒对晶粒区103的收发电路、内存106及输入输出区105的输入输出电路。封装结构还包括中介层201及基板202,中介层201设置于基板202上。当2个片上系统101进行数据传输时,其路径为发送端片上系统101→发送端晶粒对晶粒区103的收发电路→中介层201→接收端晶粒对晶粒区103的收发电路→接收端片上系统101,以此实现晶粒对晶粒端口的低延迟和低功耗的技术功效。
图3示出本发明实施例的一种板卡30的结构示意图。如图1所示,板卡30包括芯片301,其是一种系统级芯片,集成有一个或多个组合处理装置,组合处理装置是一种人工智能运算单元,用以支持各类深度学习和机器学习算法,满足计算机视觉、语音、自然语言处理、数据挖掘等领域复杂场景下的智能处理需求。特别是深度学习技术大量应用在云端智能领域,云端智能应用的一个显著特点是输入数据量大,对平台的存储能力和计算能力有很高的要求,此实施例的板卡30适用在云端智能应用,具有庞大的片外存储、片上存储和强大的计算能力。
芯片301通过对外接口装置302与外部设备303相连接。外部设备303例如是服务器、计算机、摄像头、显示器、鼠标、键盘、网卡或wifi接口等。待处理的数据可以由外部设备303通过对外接口装置302传递至芯片301。芯片301的计算结果可以经由对外接口装置302传送回外部设备303。根据不同的应用场景,对外接口装置302可以具有不同的接口形式,例如PCIe接口等。
更详细来说,芯片301包括计算装置和处理装置。计算装置配置成执行用户指定的操作,主要实现为单核智能处理器或者多核智能处理器,用以执行深度学习或机器学习的计算。处理装置作为通用的处理装置,执行包括但不限于数据搬运、对计算装置的开启和/或停止等基本控制。根据实现方式的不同,处理装置可以是中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)或其他通用和/或专用处理器中的一种或多种类型的处理器,这些处理器包括但不限于数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,并且其数目可以根据实际需要来确定。如前所述,仅就此实施例的计算装置而言,其可以视为具有单核结构或者同构多核结构。然而,当将计算装置和处理装置整合共同考虑时,二者视为形成异构多核结构。
板卡30还包括用于存储数据的存储器件304,其包括一个或多个存储单元305。存储器件304通过总线与控制器件306和芯片301进行连接和数据传输。板卡30中的控制器件306配置用于对芯片301的状态进行调控。为此,在一个应用场景中,控制器件306可以包括单片机(Micro Controller Unit,MCU)。
图4示出此实施例的芯片301的示意图,其是一种多核芯片,包括第一核层41与第二核层42,实际上第一核层41和第二核层42纵向堆叠在一块,图4中的第一核层41与第二核层42视觉上为上下分离仅为了方便说明而以此方式展示。
第一核层41包括第一运算区411、第一晶粒对晶粒区412及第一硅通孔(through silicon  via,TSV)413。第一运算区411生成有第一运算电路,以实现计算装置的功能;第一晶粒对晶粒区412生成有第一收发电路,用以作为第一运算电路的晶粒对晶粒接口;第一硅通孔413用以在三维集成电路中实现堆叠芯片的电性互连。第二核层42包括第二运算区421、第二晶粒对晶粒区422及第二硅通孔423。第二运算区421生成有第二运算电路,以实现处理装置的功能;第二晶粒对晶粒区422生成有第二收发电路,用以作为第二运算电路的晶粒对晶粒接口;第二硅通孔423同样用以在三维集成电路中实现堆叠芯片的电性互连。
在此实施例中,第一运算区411和第二运算区421还分别生成有内存414和内存424,用以暂存第一运算电路与第二运算电路的运算结果。内存414和内存424直接设置在第一运算区411和第二运算区421内,不需经过中介层传导,其数据传输速率快。
第一核层41还包括输入输出区415及物理区416,第二核层42还包括输入输出区425及物理区426。输入输出区415生成有输入输出电路,用以作为第一核层41对外联系的接口,输入输出区425生成有输入输出电路,用以作为第二核层42对外联系的接口。物理区416生成有物理访问电路,用以作为第一核层41访问片外内存的接口,物理区426生成有物理访问电路,用以作为第二核层42访问片外内存的接口。
当计算装置与处理装置要进行数据交换时,第一运算电路及第二运算电路通过第一收发电路及第二收发电路进行层间数据传输。具体来说,当计算装置欲传输数据至处理装置时,数据通过以下路径到达处理装置:第一运算区411的第一运算电路→第一晶粒对晶粒区412的第一收发电路→第一硅通孔413→第二晶粒对晶粒区422的第二收发电路→第二运算区421的第二运算电路;当处理装置欲传输数据至计算装置时,数据通过以下路径到达:第二运算区421的第二运算电路→第二晶粒对晶粒区422第二收发电路→第一硅通孔413→第一晶粒对晶粒区412的第一收发电路→第一运算区411的第一运算电路。
当计算装置的计算结果需要与片外的其他装置进行数据交换时,内存区414通过输入输出电路将数据传输至其他装置。具体来说,当内存区414的数据欲传输至片外的其他装置时,数据通过以下路径到达片外的其他装置:输入输出区415的输入输出电路→第一硅通孔413→第二硅通孔423;当片外的其他装置欲传输数据至内存区414时,数据通过前述的反向路径到达内存区414。需注意的是,第一硅通孔413与第二硅通孔423中的部分特定硅通孔专门设计用来电性传导输入输出电路的数据。
当处理装置的计算结果需要与片外的其他装置进行数据交换时,内存区424的数据通过以下路径到达片外的其他装置:输入输出区425的输入输出电路→第二硅通孔423;当片外的其他装置欲传输数据至内存区424时,数据通过前述的反向路径到达内存区424。
当计算装置的计算结果需要通过物理区416存储至片外内存时,内存区414通过物理访问电路将数据传输至片外内存。具体来说,当内存区414的数据欲传输至片外内存时,数据通过以下路径到达片外内存:物理区416的物理访问电路→第一硅通孔413→第二硅通孔423;当片外内存欲传输输入数据至内存区414供计算装置进行处理时,数据通过前述的反向路径到达内存区414。需注意的是,第一硅通孔413与第二硅通孔423中的部分特定硅通孔专门设计用来电性传导物理访问电路的数据。
当处理装置的计算结果需要通过物理区426存储至片外内存时,内存区424通过物理访问电路将数据传输至片外内存。具体来说,当内存区424的数据欲传输至片外内存时,数据通过以下路径到达片外内存:物理区426的物理访问电路→第二硅通孔423;当片外内存欲传输输入数据至内存区424供计算装置进行处理时,数据通过前述的反向路径到达内存区424。
如图4所示,第一晶粒对晶粒区412与第二晶粒对晶粒区422纵向堆叠,使得第一核层41的晶粒对晶粒接口与第二核层42的晶粒对晶粒接口直接通过第一硅通孔413电性连接,不需要利用如图2所示的中介层201进行传输。硅通孔的长度约在十几微米,相较于 中介层的毫米级的长度,此实施例的数据传输更为快速且信号强度佳。
本发明另一个实施例亦是图3所示的板卡30,其芯片301中的组合处理装置的结构如图5所示。组合处理装置50包括计算装置501、接口装置502、处理装置503和片外内存504。
计算装置501配置成执行用户指定的操作,主要实现为单核智能处理器或者多核智能处理器,用以执行深度学习或机器学习的计算,其可以通过接口装置502与处理装置503进行交互,以共同完成用户指定的操作。
接口装置502连接至总线,用以与其他装置相连接,例如图3的控制器件306、对外接口装置302等。
处理装置503作为通用的处理装置,执行包括但不限于数据搬运、对计算装置501的开启和/或停止等基本控制。根据实现方式的不同,处理装置503可以是中央处理器、图形处理器或其他通用和/或专用处理器中的一种或多种类型的处理器,这些处理器包括但不限于数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,并且其数目可以根据实际需要来确定。如前所述,仅就此实施例的计算装置501而言,其可以视为具有单核结构或者同构多核结构。然而,当将计算装置501和处理装置503整合共同考虑时,二者视为形成异构多核结构。
片外内存504用以存储待处理的数据,为DDR内存,大小通常为16G或更大,用于保存计算装置501和/或处理装置503的数据。
图6示出此实施例纵向堆叠的示意图。此实施例同样是一种多核芯片,包括第一核层61、第二核层62与内存层63,实际上第一核层61、第二核层62和内存层63依序由上至下纵向堆叠在一块,图6中的各层视觉上为上下分离仅为了方便说明而以此方式展示。
第一核层61包括第一运算区611,第一运算区611布满第一核层61的逻辑层,即图中第一核层61的顶侧,第一核层61在特别区域还包括第一晶粒对晶粒区612及第一硅通孔613。第二核层62包括第二运算区621,第二运算区621布满第二核层62的逻辑层,即图中第二核层62的顶侧,第二核层62在特别区域还包括第二晶粒对晶粒区622及第二硅通孔623。第一晶粒对晶粒区612与第二晶粒对晶粒区622的位置上下相对。其功能与作用与前述实施例相同,故不赘述。
内存层63包括内存区631、第一输入输出区632、第二输入输出区633第一物理区634、第二物理区635及第三硅通孔636,内存区631生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果,第一输入输出区632生成有第一输入输出电路,用以作为第一运算电路对外联系的接口,即实现接口装置502的功能,第二输入输出区633生成有第二输入输出电路,用以作为第二运算电路对外联系的接口,亦实现接口装置502的功能,第一物理区634生成有第一物理访问电路,用以将内存区631中存储第一运算电路的计算结果发送至片外内存504,第二物理区635生成有第二物理访问电路,用以将内存区631中存储第二运算电路的计算结果发送至片外内存504。第三硅通孔636遍布整个内存区62,示例性仅显示于一侧,用以电性连接特定的元件。
当计算装置501与处理装置503要进行数据交换时,第一运算电路及第二运算电路通过第一收发电路及第二收发电路进行层间数据传输。具体来说,当计算装置501欲传输数据至处理装置503时,数据通过以下路径到达处理装置503:第一运算区611的第一运算电路→第一晶粒对晶粒区612的第一收发电路→第一硅通孔613→第二晶粒对晶粒区622的第二收发电路→第二运算区621的第二运算电路;当处理装置503欲传输数据至计算装置501时,数据通过前述的反向路径到达计算装置501。需注意的是,第一硅通孔613中的部分特定硅通孔专门设计用来电性连接第一收发电路和第二收发电路。
当计算装置501的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,内存区631通过第一输入输出电路将数据传输至其他装置。具体来说,当内存区631的数 据欲传输至片外的其他装置时,数据通过以下路径到达片外的其他装置:第一输入输出区632的输入输出电路→第三硅通孔636;当片外的其他装置欲与计算装置501进行数据交换时,数据通过前述的反向路径到达内存区631。
当处理装置503的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,内存区631通过第二输入输出电路将数据传输至其他装置。具体来说,当内存区631的数据欲传输至片外的其他装置时,数据通过以下路径到达片外的其他装置:第二输入输出区633的输入输出电路→第三硅通孔636;当片外的其他装置欲与处理装置503进行数据交换时,数据通过前述的反向路径到达内存区631。
需注意的是,第三硅通孔636中的部分特定硅通孔专门设计用来电性传导第一及第二输入输出电路的数据。
当计算装置501的计算结果需要通过第一物理区634存储至片外内存504时,内存区631通过第一物理访问电路将数据传输至片外内存504。具体来说,当内存区631的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第一物理区634的第一物理访问电路→第三硅通孔636;当片外内存504欲传输输入数据至内存区631供计算装置501进行处理时,数据通过前述的反向路径到达内存区631。
当处理装置503的计算结果需要通过第二物理区635存储至片外内存504时,内存区631通过第二物理访问电路将数据传输至片外内存504。具体来说,当内存区631的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第二物理区635的第二物理访问电路→第三硅通孔636;当片外内存504欲传输输入数据至内存区631供处理装置503进行处理时,数据通过前述的反向路径到达内存区631。
需注意的是,第三硅通孔636中的部分特定硅通孔专门设计用来电性传导第一物理访问电路及第一物理访问电路的数据。
如图6所示,第一晶粒对晶粒区612与第二晶粒对晶粒区622纵向堆叠,使得第一核层61的晶粒对晶粒接口与第二核层62的晶粒对晶粒接口直接通过第一硅通孔613电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例同样是实现如图5所示的结构。图7示出此实施例纵向堆叠的示意图。此实施例同样是一种多核芯片,包括第一核层71、第一内存层72、第二核层73及第二内存层74,实际上第一核层71、第一内存层72、第二核层73及第二内存层74依序纵向堆叠在一块,图7中的各层视觉上为上下分离仅为了方便说明而以此方式展示。
第一核层71包括第一运算区711,第一运算区711布满第一核层71的逻辑层,即图中第一核层71的顶侧,第一核层71在特别区域还包括第一晶粒对晶粒区712及第一硅通孔713,第二核层73包括第二运算区731,第二运算区731布满第二核层73的逻辑层,即图中第二核层73的顶侧,第二核层73在特别区域还包括第二晶粒对晶粒区732及第二硅通孔733,其功能和作用与前述实施例相同,故不赘述。
第一内存层72包括第一内存区721、第一输入输出区722、第一物理区723及第三硅通孔724。第一内存区721生成有存储单元,用以暂存第一运算电路的运算结果。第一输入输出区722生成有第一输入输出电路,用以作为第一核层71与第一内存层72对外联系的接口,即实现接口装置502的功能。第二物理区723生成有第一物理访问电路,用以访问片外内存504。第三硅通孔724遍布整个第一内存层72,示例性仅显示于一侧,用以电性连接特定的元件。
第二内存层74包括第二内存区741、第二输入输出区742、第二物理区743及第四硅通孔744。第二内存区741生成有存储单元,用以暂存第二运算电路的运算结果。第二输入输出区742生成有第二输入输出电路,用以作为第二核层73与第二内存层74对外联系的接口,即实现接口装置502的功能。第二物理区743生成有第二物理访问电路,用以访问片外内存504。第四硅通孔744遍布整个第二内存层74,示例性仅显示于一侧,用以电 性连接特定的元件。
各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存504。
当计算装置501欲传输数据至处理装置503时,数据通过以下路径到达处理装置503:第一运算区711的第一运算电路→第一晶粒对晶粒区712的第一收发电路→第一硅通孔713的收发硅通孔→第三硅通孔724的收发硅通孔→第二晶粒对晶粒区732的第二收发电路→第二运算区731的第二运算电路;当处理装置503欲传输数据至计算装置501时,数据通过前述的反向路径到达计算装置501。
当计算装置501的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第一输入输出区722的第一输入输出电路→第三硅通孔724的输入输出硅通孔→第二硅通孔733的输入输出硅通孔→第四硅通孔744的输入输出硅通孔;当片外的其他装置欲传输数据至第一内存区721时,数据通过前述的反向路径到达第一内存区721。当处理装置503的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第二输入输出区742的输入输出电路→第四硅通孔744的输入输出硅通孔;当片外的其他装置欲传输数据至第二内存区741时,数据通过前述的反向路径到达第二内存区741。
当第一内存区721的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第一物理区723的第一物理访问电路→第三硅通孔724的物理硅通孔→第二硅通孔733的物理硅通孔→第四硅通孔744的物理硅通孔;当片外内存504欲传输输入数据至第一内存区721供计算装置501进行处理时,数据通过前述的反向路径到达第一内存区721。当第二内存区741的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第二物理区743的第二物理访问电路→第四硅通孔744的物理硅通孔;当片外内存504欲传输输入数据至第二内存区741供处理装置503进行处理时,数据通过前述的反向路径到达第二内存区741。
在此实施例中,第一核层71与第一内存层72搭配使用,第二核层73与第二内存层74搭配使用,为了传输效率,第一核层71与第一内存层72采用面对面贴合制程,使得第一运算电路与第一内存区721的传输路径最短,第二核层73与第二内存层74采用面对面贴合制程,同样使得第二运算电路与第二内存区741的传输路径最短。为了实现前述最短传输路径,第一内存层72与第二核层73则采用背对背贴合制程。
如图7所示,第一晶粒对晶粒区712与第二晶粒对晶粒区732纵向堆叠,使得第一核层71的晶粒对晶粒接口与第二核层73的晶粒对晶粒接口直接通过第一硅通孔713与第三硅通孔724电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例同样是实现如图5所示的结构。图8示出此实施例纵向堆叠的示意图。此实施例的多核芯片包括第一核层81、第一内存层82、第二核层83、第二内存层84、第三内存层85及第四内存层86,更详细来说,此实施例的多核芯片分为第一晶粒组和第二晶粒组,第一晶粒组堆叠在第二晶粒组上,第一晶粒组由上至下分别为第三内存层85、第一核层81及第一内存层82,第二晶粒组由上至下分别为第四内存层86、第二核层83及第二内存层84,即第四内存层86位于第一内存层82与第二核层83间。图8中的各层视觉上为上下分离仅为了方便说明而以此方式展示。
第一核层81、第一内存层82、第二核层83、第二内存层84的功能和作用与前述实施例中的第一核层71、第一内存层72、第二核层73、第二内存层74相同,故不赘述。
第三内存层85包括第三内存区851及第五硅通孔852,第三内存区851布满第三内存层85的逻辑层,即图中第三内存层85的顶侧。第三内存区851生成有存储单元,用以暂存第一运算电路的运算结果,第五硅通孔852遍布整个第三内存层85,示例性仅显示于 一侧,用以电性连接特定的元件。第三内存层85仅负责暂存第一运算电路的运算结果,不负责第一晶粒组对外的联系任务。第一运算电路可以使用第一内存区821和第三内存区851的暂存空间,当计算装置501欲暂存中间数据时,可以通过第五硅通孔852暂存至第三内存区851,或是通过第一硅通孔813暂存至第一内存区821。
第四内存层86包括第四内存区861及第六硅通孔862,第四内存区861布满第四内存层86的逻辑层,即图中第四内存层86的顶侧。第四内存区861生成有存储单元,用以暂存第二运算电路的运算结果,第六硅通孔862遍布整个第四内存层86,示例性仅显示于一侧,用以电性连接特定的元件。第四内存层86仅负责暂存第二运算电路的运算结果,不负责第二晶粒组对外的联系任务。第二运算电路可以使用第二内存区841和第四内存区861的暂存空间,当处理装置503欲暂存中间数据时,可以通过第六硅通孔862暂存至第四内存区861,或是通过第二硅通孔833暂存至第二内存区841。
各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存504。
当计算装置501欲传输数据至处理装置503时,数据通过以下路径到达处理装置503:第一运算区811的第一运算电路→第一晶粒对晶粒区812的第一收发电路→第一硅通孔813的收发硅通孔→第三硅通孔824的收发硅通孔→第六硅通孔862的收发硅通孔→第二晶粒对晶粒区832的第二收发电路→第二运算区831的第二运算电路;当处理装置503欲传输数据至计算装置501时,数据通过前述的反向路径到达计算装置501。
当第一晶粒组的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第一输入输出区822的第一输入输出电路→第三硅通孔824的输入输出硅通孔→第六硅通孔862的输入输出硅通孔→第二硅通孔833的输入输出硅通孔→第四硅通孔844的输入输出硅通孔;当片外的其他装置欲传输数据至第一晶粒组时,数据通过前述的反向路径到达第一内存区821。当第二晶粒组的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第二输入输出区842的第二输入输出电路→第四硅通孔844的输入输出硅通孔;当片外的其他装置欲传输数据至第二晶粒组时,数据通过前述的反向路径到达第二内存区841。
当第一晶粒组的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第一物理区823的第一物理访问电路→第三硅通孔824的物理硅通孔→第六硅通孔862的物理硅通孔→第二硅通孔833的物理硅通孔→第四硅通孔844的物理硅通孔;当片外内存504欲传输输入数据至第一晶粒组供计算装置501进行处理时,数据通过前述的反向路径到达第一内存区821。当第二晶粒组的数据欲传输至片外内存504时,数据通过以下路径到达片外内存504:第二物理区843的第二物理访问电路→第四硅通孔844的物理硅通孔;当片外内存504欲传输输入数据至第二晶粒组供处理装置503进行处理时,数据通过前述的反向路径到达第二内存区841。
在此实施例中,第一核层81与第一内存层82和第三内存层85搭配使用,第二核层83与第二内存层84和第四内存层86搭配使用,为了传输效率,第一核层81与第一内存层82采用面对面贴合制程,使得第一运算电路与第一内存区821的传输路径最短,第一核层81与第三内存层85采用面对背贴合制程,第一内存层82与第四内存层86采用背对背贴合制程,第二核层83与第四内存层86采用面对面贴合制程,同样使得第二运算电路与第四内存区861的传输路径最短,第二核层83与第二内存层84采用面对背贴合制程。
如图8所示,第一晶粒对晶粒区812与第二晶粒对晶粒区832纵向堆叠,使得第一核层81的晶粒对晶粒接口与第二核层83的晶粒对晶粒接口直接通过第一硅通孔813、第三硅通孔824与第六硅通孔862电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例同样是实现如图5所示的结构。图9示出此实施例纵向堆叠的示意图。此实施例的多核芯片由上至下堆叠分为第一晶粒组、第二晶粒组和第三晶粒组。第一晶粒组由上至下分别为第一核层91及第一内存层92,第二晶粒组由上至下分别为第二核层93及第二内存层94,第三晶粒组仅包括第三内存层95,故第三内存层95位于第二内存层94下。图9中的各层视觉上为上下分离仅为了方便说明而以此方式展示。
第一核层91包括第一运算区911,第一运算区911布满第一核层91的逻辑层,即图中第一核层91的顶侧,第一核层91在特别区域还包括第一晶粒对晶粒区912及第一硅通孔913,第一内存层92包括第一内存区921及第二硅通孔922,第一内存区921布满第一内存层92的逻辑层,即图中第一内存层92的顶侧。第一内存区921生成有存储单元,用以暂存第一运算电路的运算结果。第二核层93包括第二运算区931,第二运算区931布满第二核层93的逻辑层,即图中第二核层93的顶侧,第二核层93在特别区域还包括第二晶粒对晶粒区932及第三硅通孔933,第二内存层94包括第二内存区941及第四硅通孔942,第二内存区941布满第二内存层94的逻辑层,即图中第二内存层94的顶侧,第二内存区941生成有存储单元,用以暂存第二运算电路的运算结果。
第三内存层95包括第三内存区951、第一输入输出区952、第二输入输出区953、第一物理访问区954、第二物理访问区955及第五硅通孔956,第三内存区951生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果,第一输入输出区952生成有第一输入输出电路,用以作为第一晶粒组对外联系的接口,即实现接口装置502的功能,第二输入输出区953生成有第二输入输出电路,用以作为第二晶粒组对外联系的接口,即实现接口装置502的功能,第一物理区954生成有第一物理访问电路,用以联系第一晶粒组与片外内存504,第二物理区955生成有第二物理访问电路,用以联系第二晶粒组与片外内存504。
各硅通孔遍布整个层中,示例性仅显示于一侧。各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存504。
当计算装置501欲传输数据至处理装置503时,数据通过以下路径到达处理装置503:第一运算区911的第一运算电路→第一晶粒对晶粒区912的第一收发电路→第一硅通孔913的收发硅通孔→第二硅通孔922的收发硅通孔→第二晶粒对晶粒区932的第二收发电路→第二运算区931的第二运算电路;当处理装置503欲传输数据至计算装置501时,数据通过前述的反向路径到达计算装置501。
第一晶粒组与第二晶粒组不直接对片外联系,当需要对片外联系时,此实施例通过第三晶粒组的第三内存层95来执行。
当计算装置501的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据会通过各层的输入输出硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第一输入输出区952的第一输入输出电路→第五硅通孔956的第一输入输出硅通孔;当片外的其他装置欲传输数据至第一晶粒组时,数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至第一内存区921。
当处理装置503的计算结果需要通过接口装置502与片外的其他装置进行数据交换时,数据会通过各层的输入输出硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第二输入输出区953的第二输入输出电路→第五硅通孔956的第二输入输出硅通孔;当片外的其他装置欲传输数据至第二晶粒组时,数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至达第二内存区941。
当第一内存区921的数据欲传输至片外内存504时,数据会通过各层的物理硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第一 物理区954的第一物理访问电路→第五硅通孔956的第一物理硅通孔;当片外内存504欲传输输入数据至第一晶粒组时,输入数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至达第一内存区921。
当第二内存区941的数据欲传输至片外内存504时,数据会通过第四硅通孔的物理硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第二物理区955的第二物理访问电路→第五硅通孔956的第二物理硅通孔;当片外内存504欲传输输入数据至第二晶粒组时,输入数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951通过第四硅通孔的物理硅通孔传送至达第二内存区941。
在此实施例中,第一核层91与第一内存层92搭配使用,第二核层93与第二内存层94搭配使用,为了传输效率,第一核层91与第一内存层92采用面对面贴合制程,使得第一运算电路与第一内存区921的传输路径最短,第二核层93与第二内存层94采用面对面贴合制程,同样使得第二运算电路与第二内存区941的传输路径最短。为了实现前述最短传输路径,第一内存层92与第二核层93则采用背对背贴合制程,第二内存层94与第三内存层95采用面对背贴合制程。
如图9所示,第一晶粒对晶粒区912与第二晶粒对晶粒区932纵向堆叠,使得第一核层91的晶粒对晶粒接口与第二核层93的晶粒对晶粒接口直接通过第一硅通孔913与第二硅通孔922电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例同样是实现如图5所示的结构。图10示出此实施例纵向堆叠的示意图。此实施例的多核芯片由上至下堆叠分为第一晶粒组、第二晶粒组和第三晶粒组。第一晶粒组由上至下分别为第三内存层B及第一核层A,第二晶粒组由上至下分别为第一内存层D及第二核层C,第三晶粒组仅包括第二内存层E。明显地,此实施例的纵向堆叠结构与图9的实施例差异仅在于第一晶粒组与第二晶粒组的核层与内存层位置对调,本领域技术人员基于前述实施例的说明,无需创造性的劳动便可知悉此实施例各层间的协同方式,故不赘述。
上述多个实施例都是一种纵向堆叠的片上系统,可以用FCBGA(flip chip ball grid array)或是CoWoS(chip on wafer on substrate)封装工艺来实现。FCBGA被称为倒装芯片球栅格阵列的封装格式,用小球代替针脚来连接电路,能提供最短的对外连接距离,采用这一封装不仅提供优异的电性效能,同时可以减少组件互连间的损耗及电感,降低电磁干扰的问题,并承受较高的频率。CoWoS是一种整合生产技术,先将晶粒通过CoW的封装制程连接至硅晶圆,再把CoW晶粒与基板连接,整合成CoWoS,通过这种技术可以把多颗晶粒封装到一起,达到了封装体积小、功耗低、引脚少的技术功效。
本发明的另一个实施例是一种制成如图4所示的多核芯片的方法,其流程图如图11所示。
在步骤1101中,生成第一核层41,第一核层包括第一运算区411及第一晶粒对晶粒区412,其中第一运算区411生成有第一运算电路,第一晶粒对晶粒区412生成有第一收发电路。在此步骤中,在第一核层41生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1102中,生成第二核层42,第二核层包括第二运算区421及第二晶粒对晶粒区422,其中第二运算区421生成有第二运算电路,第二晶粒对晶粒区422生成有第二收发电路。
第一核层41和第二核层42纵向堆叠,第一运算电路及第二运算电路通过第一收发电路及第二收发电路进行层间数据传输。本领域技术人员可以通过图4的实施例的描述知悉此实施例的技术手段,故不赘述。
在此实施例中,第一晶粒对晶粒区412与第二晶粒对晶粒区422纵向堆叠,使得第一核层41的晶粒对晶粒接口与第二核层42的晶粒对晶粒接口直接通过第一硅通孔413电 性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例是一种制成如图6所示的多核芯片的方法,其流程图如图12所示。
在步骤1201中,生成第一核层61,第一核层61包括第一运算区611及第一晶粒对晶粒区612,其中第一运算区611生成有第一运算电路,第一晶粒对晶粒区612生成有第一收发电路。在此步骤中,在第一核层61生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1202中,生成内存层63,在内存层63生成内存区631、输入输出区632、第一物理区634及硅通孔624。内存区631生成有存储单元,用以暂存第一运算电路与第二运算电路的运算结果;输入输出区632生成有输入输出电路,用以作为多核芯片对外联系的接口;第一物理区634生成有物理访问电路,用以访问片外内存504。硅通孔624用以电性连接第一收发电路及第二收发电路。在此步骤中,在内存层63生成收发硅通孔,用以电性连接第一收发电路及第二收发电路,具体来说,是将部分的硅通孔624设置成收发硅通孔。
在步骤1203中,生成第二核层62,第二核层62包括第二运算区621及第二晶粒对晶粒区622,其中第二运算区621生成有第二运算电路,第二晶粒对晶粒区622生成有第二收发电路。
在此实施例中,第一核层61、内存层63及第二核层62依序堆叠,即在第一核层61和第二核层62间生成内存层63。第一晶粒对晶粒区612与第二晶粒对晶粒区622纵向堆叠,使得第一核层61的晶粒对晶粒接口与第二核层62的晶粒对晶粒接口直接通过第一硅通孔613与第三硅通孔636电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例是一种制成如图7所示的多核芯片的方法,其流程图如图13所示。
在步骤1301中,生成第一核层71,第一核层71包括第一运算区711及第一晶粒对晶粒区712,其中第一运算区711生成有第一运算电路,第一晶粒对晶粒区712生成有第一收发电路。在此步骤中,在第一核层71生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1302中,生成第一内存层72,第一内存层72包括第一内存区721,生成有存储单元,用以暂存第一运算电路的运算结果。在此步骤中,在第一内存层72生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1303中,生成第二核层73,第二核层73包括第二运算区731及第二晶粒对晶粒区732,其中第二运算区731生成有第二运算电路,第二晶粒对晶粒区732生成有第二收发电路。
在步骤1304中,生成第二内存层74,第二内存层74包括第二内存区741,生成有存储单元,用以暂存第二运算电路的运算结果。在此步骤中,在第二内存层74生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在此实施例中,第一核层71、第一内存层72、第二核层73、第二内存层74依序堆叠,更具体来说,第一晶粒对晶粒区712与第二晶粒对晶粒区732纵向堆叠,使得第一核层71的晶粒对晶粒接口与第二核层73的晶粒对晶粒接口直接通过收发硅通孔电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例是一种制成如图8所示的多核芯片的方法,其流程图如图14所示。
在步骤1401中,生成第三内存层85,第三内存层85包括第三内存区851,生成有存储单元,用以暂存第一运算电路的运算结果。
在步骤1402中,生成第一核层81,第一核层81包括第一运算区811及第一晶粒对 晶粒区812,其中第一运算区811生成有第一运算电路,第一晶粒对晶粒区812生成有第一收发电路。在此步骤中,在第一核层81生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1403中,生成第一内存层82,第一内存层82包括第一内存区821,生成有存储单元,用以暂存第一运算电路的运算结果。在此步骤中,在第一内存层82生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1404中,生成第四内存层86,第四内存层86包括第四内存区861,生成有存储单元,用以暂存第二运算电路的运算结果,其中第四内存层86位于第一内存层82与第二核层83间。在此步骤中,在第四内存层86生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1405中,生成第二核层83,第二核层83包括第二运算区831及第二晶粒对晶粒区832,其中第二运算区831生成有第二运算电路,第二晶粒对晶粒区832生成有第二收发电路。
在步骤1406中,生成第二内存层84,第二内存层84包括第二内存区841,生成有存储单元,用以暂存第二运算电路的运算结果。在此步骤中,在第二内存层84生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在此实施例中,第三内存层85、第一核层81、第一内存层82、第四内存层86、第二核层83、第二内存层84依序堆叠,更具体来说,第一晶粒对晶粒区812与第二晶粒对晶粒区832纵向堆叠,使得第一核层81的晶粒对晶粒接口与第二核层83的晶粒对晶粒接口直接通过收发硅通孔电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例是一种制成如图9所示的多核芯片的方法,其流程图如图15所示。
在步骤1501中,生成第一核层91,第一核层91包括第一运算区911及第一晶粒对晶粒区912,其中第一运算区911生成有第一运算电路,第一晶粒对晶粒区912生成有第一收发电路。在此步骤中,在第一核层91生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1502中,生成第一内存层92,第一内存层92包括第一内存区921,生成有存储单元,用以暂存第一运算电路的运算结果。在此步骤中,在第一内存层92生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1503中,生成第二核层93,第二核层93包括第二运算区931及第二晶粒对晶粒区932,其中第二运算区931生成有第二运算电路,第二晶粒对晶粒区932生成有第二收发电路。
在步骤1504中,生成第二内存层94,第二内存层94包括第二内存区941,生成有存储单元,用以暂存第二运算电路的运算结果。
在步骤1505中,生成第三内存层95,第三内存层95包括第三内存区951,生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果,其中第三内存层95位于第二内存层94之下。
在此实施例中,第一核层91、第一内存层92、第二核层93、第二内存层94及第三内存层95依序堆叠,更具体来说,第一晶粒对晶粒区912与第二晶粒对晶粒区932纵向堆叠,使得第一核层91的晶粒对晶粒接口与第二核层93的晶粒对晶粒接口直接通过收发硅通孔电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的另一个实施例是一种制成如图10所示的多核芯片的方法,其流程图如图16所示。
在步骤1601中,生成第三内存层B,第三内存层B包括第三内存区1021,生成有存储单元,用以暂存第一运算电路的运算结果。
在步骤1602中,生成第一核层A,第一核层A包括第一运算区1011及第一晶粒对晶粒区1012,其中第一运算区1011生成有第一运算电路,第一晶粒对晶粒区1012生成有第一收发电路。在此步骤中,在第一核层A生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1603中,生成第一内存层D,第一内存层D包括第一内存区1041,生成有存储单元,用以暂存第二运算电路的运算结果。在此步骤中,在第一内存层D生成收发硅通孔,用以电性连接第一收发电路及第二收发电路。
在步骤1604中,生成第二核层C,第二核层C包括第二运算区1031及第二晶粒对晶粒区1032,其中第二运算区1031生成有第二运算电路,第二晶粒对晶粒区1032生成有第二收发电路。
在步骤1605中,生成第二内存层E,第二内存层E包括第二内存区1051,生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果。
在此实施例中,第三内存层B、第一核层A、第一内存层D、第二核层C、第二内存层E依序堆叠,更具体来说,第一晶粒对晶粒区1012与第二晶粒对晶粒区1032纵向堆叠,使得第一核层A的晶粒对晶粒接口与第二核层C的晶粒对晶粒接口直接通过收发硅通孔电性连接,不需要利用如图2所示的中介层201进行传输。
本发明的方案是通过将核层纵向堆叠,使得核层的晶粒对晶粒区亦是纵向堆叠,两晶粒对晶粒接口无需通过中介层而是以硅通孔进行数据传输,两晶粒对晶粒接口的传输路径大大缩短了,有助于提高核间的传输效率。
根据不同的应用场景,本发明的电子设备或装置可以包括服务器、云端服务器、服务器集群、数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、PC设备、物联网终端、移动终端、手机、行车记录仪、导航仪、传感器、摄像头、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、视觉终端、自动驾驶终端、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。本发明的电子设备或装置还可以被应用于互联网、物联网、数据中心、能源、交通、公共管理、制造、教育、电网、电信、金融、零售、工地、医疗等领域。进一步,本发明的电子设备或装置还可以用于云端、边缘端、终端等与人工智能、大数据和/或云计算相关的应用场景中。在一个或多个实施例中,根据本发明方案的算力高的电子设备或装置可以应用于云端设备(例如云端服务器),而功耗小的电子设备或装置可以应用于终端设备和/或边缘端设备(例如智能手机或摄像头)。在一个或多个实施例中,云端设备的硬件信息和终端设备和/或边缘端设备的硬件信息相互兼容,从而可以根据终端设备和/或边缘端设备的硬件信息,从云端设备的硬件资源中匹配出合适的硬件资源来模拟终端设备和/或边缘端设备的硬件资源,以便完成端云一体或云边端一体的统一管理、调度和协同工作。
需要说明的是,为了简明的目的,本发明将一些方法及其实施例表述为一系列的动作及其组合,但是本领域技术人员可以理解本发明的方案并不受所描述的动作的顺序限制。因此,依据本发明的公开或教导,本领域技术人员可以理解其中的某些步骤可以采用其他顺序来执行或者同时执行。进一步,本领域技术人员可以理解本发明所描述的实施例可以视为可选实施例,即其中所涉及的动作或模块对于本发明某个或某些方案的实现并不一定是必需的。另外,根据方案的不同,本发明对一些实施例的描述也各有侧重。鉴于此,本领域技术人员可以理解本发明某个实施例中没有详述的部分,也可以参见其他实施例的相关描述。
在具体实现方面,基于本发明的公开和教导,本领域技术人员可以理解本发明所公开的若干实施例也可以通过本文未公开的其他方式来实现。例如,就前文所述的电子设备 或装置实施例中的各个单元来说,本文在考虑了逻辑功能的基础上对其进行拆分,而实际实现时也可以有另外的拆分方式。又例如,可以将多个单元或组件结合或者集成到另一个系统,或者对单元或组件中的一些特征或功能进行选择性地禁用。就不同单元或组件之间的连接关系而言,前文结合附图所讨论的连接可以是单元或组件之间的直接或间接耦合。在一些场景中,前述的直接或间接耦合涉及利用接口的通信连接,其中通信接口可以支持电性、光学、声学、磁性或其它形式的信号传输。
在本发明中,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元示出的部件可以是或者也可以不是物理单元。前述部件或单元可以位于同一位置或者分布到多个网络单元上。另外,根据实际的需要,可以选择其中的部分或者全部单元来实现本发明实施例所述方案的目的。另外,在一些场景中,本发明实施例中的多个单元可以集成于一个单元中或者各个单元物理上单独存在。
在另外一些实现场景中,上述集成的单元也可以采用硬件的形式实现,即为具体的硬件电路,其可以包括数字电路和/或模拟电路等。电路的硬件结构的物理实现可以包括但不限于物理器件,而物理器件可以包括但不限于晶体管或忆阻器等器件。鉴于此,本文所述的各类装置(例如计算装置或其他处理装置)可以通过适当的硬件处理器来实现,例如中央处理器、GPU、FPGA、DSP和ASIC等。进一步,前述的所述存储单元或存储装置可以是任意适当的存储介质(包括磁存储介质或磁光存储介质等),其例如可以是可变电阻式存储器(Resistive Random Access Memory,RRAM)、动态随机存取存储器(Dynamic Random Access Memory,DRAM)、静态随机存取存储器(Static Random Access Memory,SRAM)、增强动态随机存取存储器(Enhanced Dynamic Random Access Memory,EDRAM)、高带宽存储器(High Bandwidth Memory,HBM)、混合存储器立方体(Hybrid Memory Cube,HMC)、ROM和RAM等。
依据以下条款可更好地理解前述内容:
条款A1.一种多核芯片,包括:第一核层,包括:第一运算区,生成有第一运算电路;以及第一晶粒对晶粒区,生成有第一收发电路;第二核层,包括:第二运算区,生成有第二运算电路;以及第二晶粒对晶粒区,生成有第二收发电路;其中,所述第一核层和所述第二核层纵向堆叠,所述第一运算电路及所述第二运算电路通过所述第一收发电路及所述第二收发电路进行层间数据传输。
条款A2.根据条款A1所述的多核芯片,连接至片外内存,还包括内存层,所述内存层包括:内存区,生成有存储单元,用以暂存所述第一运算电路与所述第二运算电路的运算结果;输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口;以及物理区,生成有物理访问电路,用以访问所述片外内存。
条款A3.根据条款A2所述的多核芯片,其中所述内存层位于所述第一核层和所述第二核层间,所述内存层生成有硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A4.根据条款A2所述的多核芯片,其中所述内存区位于所述第一核层和所述第二核层间,所述第二核层生成有硅通孔,用以电性传导所述输入输出电路的数据。
条款A5.根据条款A2所述的多核芯片,其中所述内存区位于所述第一核层和所述第二核层间,所述第二核层生成有硅通孔,用以电性传导所述物理访问电路的数据。
条款A6.根据条款A1所述的多核芯片,还包括:第一内存层,包括第一内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果;以及第二内存层,包括第二内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果;其中,所述第一核层、所述第一内存层、所述第二核层、所述第二内存层依序堆叠,所述第一内存层生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A7.根据条款A6所述的多核芯片,其中所述第一内存层还包括第一输入输出 区,生成有第一输入输出电路,用以作为所述多核芯片对外联系的接口,所述第二核层及所述第二内存层生成有输入输出硅通孔,用以电性传导所述第一输入输出电路的数据。
条款A8.根据条款A6所述的多核芯片,其中所述第二内存层还包括第二输入输出区,生成有第二输入输出电路,通过输入输出硅通孔电性连接至所述多核芯片外。
条款A9.根据条款A6所述的多核芯片,连接至片外内存,其中所述第一内存层还包括第一物理区,生成有物理访问电路,所述第二核层及所述第二内存层生成有物理硅通孔,用以电性传导所述第一运算电路的运算结果至所述片外内存。
条款A10.根据条款A6所述的多核芯片,连接至片外内存,其中所述第二内存层还包括第二物理区,生成有物理访问电路,通过物理硅通孔将所述第二运算电路的运算结果传送至所述片外内存。
条款A11.根据条款A6所述的多核芯片,其中所述第一核层与所述第一内存层为面对面制程,所述第一内存层与所述第二核层为背对背制程,所述第二核层与所述第二内存层为面对面制程。
条款A12.根据条款A6所述的多核芯片,还包括第三内存层,所述第三内存层包括第三内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果,其中所述第三内存层位于所述第一核层之上。
条款A13.根据条款A12所述的多核芯片,其中所述第三内存层与所述第一核层为面对面或面对背制程。
条款A14.根据条款A6所述的多核芯片,还包括第四内存层,所述第四内存层包括第四内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果,其中所述第四内存层位于所述第一内存层与所述第二核层间,所述第四内存层生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A15.根据条款A14所述的多核芯片,其中所述第一内存层还包括第一输入输出区,生成有第一输入输出电路,用以作为所述多核芯片对外联系的接口,所述第四内存层、所述第二核层及所述第二内存层生成有输入输出硅通孔,用以电性传导所述第一输入输出电路的数据。
条款A16.根据条款A14所述的多核芯片,连接至片外内存,其中所述第一内存层还包括第一物理区,生成有物理访问电路,所述第四内存层、所述第二核层及所述第二内存层生成有物理硅通孔,用以电性传导所述第一运算电路的运算结果至所述片外内存。
条款A17.根据条款A14所述的多核芯片,其中所述第一核层与所述第一内存层为面对面制程,所述第一内存层与所述第四内存层为背对背制程,所述第四内存层与所述第二核层为面对面制程,所述第二核层及所述第二内存层为面对背制程。
条款A18.根据条款A6所述的多核芯片,还包括第三内存层,包括第三内存区,生成有存储单元,用以暂存所述第一运算电路或所述第二运算电路的运算结果,其中,所述第三内存层位于所述第二内存层之下。
条款A19.根据条款A18所述的多核芯片,其中所述第三内存层还包括输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口。
条款A20.根据条款A18所述的多核芯片,连接至片外内存,其中所述第三内存层还包括物理区,生成有物理访问电路,用以电性传导所述第一运算电路及所述第二运算电路的运算结果至所述片外内存。
条款A21.根据条款A18所述的多核芯片,其中所述第一核层与所述第一内存层为面对面制程,所述第一内存层与所述第二核层为背对背制程,所述第二核层与所述第二内存层为面对面制程,所述第二内存层与所述第三内存层为面对背制程。
条款A22.根据条款A1至21所述任一项的多核芯片,其中各层以倒装芯片球栅格阵列方式封装。
条款A23.根据条款A1至21所述任一项的多核芯片,其中各层以CoWoS方式封装。
条款A24.一种集成电路装置,包括根据条款A1至21任一项所述的多核芯片。
条款A25.一种板卡,包括根据条款A24所述的集成电路装置。
条款A26.一种制成多核芯片的方法,包括:生成第一核层,所述第一核层包括:第一运算区,生成有第一运算电路;以及第一晶粒对晶粒区,生成有第一收发电路;生成第二核层,所述第二核层包括:第二运算区,生成有第二运算电路;以及第二晶粒对晶粒区,生成有第二收发电路;其中,所述第一核层和所述第二核层纵向堆叠,所述第一运算电路及所述第二运算电路通过所述第一收发电路及所述第二收发电路进行层间数据传输。
条款A27.根据条款A26所述的方法,所述多核芯片连接至片外内存,所述方法还包括在所述第一核层和所述第二核层间生成内存层,所述内存层包括:内存区,生成有存储单元,用以暂存所述第一运算电路与所述第二运算电路的运算结果;输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口;以及物理区,生成有物理访问电路,用以访问所述片外内存。
条款A28.根据条款A27所述的方法,其中所述生成内存层的步骤包括在所述内存层生成有硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A29.根据条款A26所述的方法,还包括:生成第一内存层,包括第一内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果;以及生成第二内存层,包括第二内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果;其中,所述第一核层、所述第一内存层、所述第二核层、所述第二内存层依序堆叠;其中所述生成第一内存层的步骤包括在所述第一内存层生成收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A30.根据条款A29所述的方法,还包括生成第三内存层,所述第三内存层包括第三内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果,其中所述第三内存层位于所述第一核层之上。
条款A31.根据条款A30所述的方法,还包括生成第四内存层,所述第四内存层包括第四内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果,其中所述第四内存层位于所述第一内存层与所述第二核层间,所述生成第四内存层的步骤包括在所述第四内存层生成收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
条款A32.根据条款A29所述的方法,还包括生成第三内存层,包括第三内存区,生成有存储单元,用以暂存所述第一运算电路或所述第二运算电路的运算结果,其中所述第三内存层位于所述第二内存层之下。
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (33)

  1. 一种多核芯片,包括:
    第一核层,包括:
    第一运算区,生成有第一运算电路;以及
    第一晶粒对晶粒区,生成有第一收发电路;
    第二核层,包括:
    第二运算区,生成有第二运算电路;以及
    第二晶粒对晶粒区,生成有第二收发电路;
    其中,所述第一核层和所述第二核层纵向堆叠,所述第一运算电路及所述第二运算电路通过所述第一收发电路及所述第二收发电路进行层间数据传输。
  2. 根据权利要求1所述的多核芯片,连接至片外内存,还包括内存层,所述内存层包括:
    内存区,生成有存储单元,用以暂存所述第一运算电路与所述第二运算电路的运算结果;
    输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口;以及
    物理区,生成有物理访问电路,用以访问所述片外内存。
  3. 根据权利要求2所述的多核芯片,其中所述内存层位于所述第一核层和所述第二核层间,所述内存层生成有硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  4. 根据权利要求2所述的多核芯片,其中所述内存区位于所述第一核层和所述第二核层间,所述第二核层生成有硅通孔,用以电性传导所述输入输出电路的数据。
  5. 根据权利要求2所述的多核芯片,其中所述内存区位于所述第一核层和所述第二核层间,所述第二核层生成有硅通孔,用以电性传导所述物理访问电路的数据。
  6. 根据权利要求1所述的多核芯片,还包括:
    第一内存层,包括第一内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果;以及
    第二内存层,包括第二内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果;
    其中,所述第一核层、所述第一内存层、所述第二核层、所述第二内存层依序堆叠,所述第一内存层生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  7. 根据权利要求6所述的多核芯片,其中所述第一内存层还包括第一输入输出区,生成有第一输入输出电路,用以作为所述多核芯片对外联系的接口,所述第二核层及所述第二内存层生成有输入输出硅通孔,用以电性传导所述第一输入输出电路的数据。
  8. 根据权利要求6所述的多核芯片,其中所述第二内存层还包括第二输入输出区,生成有第二输入输出电路,通过输入输出硅通孔电性连接至所述多核芯片外。
  9. 根据权利要求6所述的多核芯片,连接至片外内存,其中所述第一内存层还包括第一物理区,生成有物理访问电路,所述第二核层及所述第二内存层生成有物理硅通孔,用以电性传导所述第一运算电路的运算结果至所述片外内存。
  10. 根据权利要求6所述的多核芯片,连接至片外内存,其中所述第二内存层还包括第二物理区,生成有物理访问电路,通过物理硅通孔将所述第二运算电路的运算结果传送至所述片外内存。
  11. 根据权利要求6所述的多核芯片,其中所述第一核层与所述第一内存层为面对面
  12. 制程,所述第一内存层与所述第二核层为背对背制程,所述第二核层与所述第二内存层为面对面制程。
  13. 根据权利要求6所述的多核芯片,还包括第三内存层,所述第三内存层包括第三内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果,其中所述第三内存层位于所述第一核层之上。
  14. 根据权利要求12所述的多核芯片,其中所述第三内存层与所述第一核层为面对面或面对背制程。
  15. 根据权利要求6所述的多核芯片,还包括第四内存层,所述第四内存层包括第四内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果,其中所述第四内存层位于所述第一内存层与所述第二核层间,所述第四内存层生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  16. 根据权利要求14所述的多核芯片,其中所述第一内存层还包括第一输入输出区,生成有第一输入输出电路,用以作为所述多核芯片对外联系的接口,所述第四内存层、所述第二核层及所述第二内存层生成有输入输出硅通孔,用以电性传导所述第一输入输出电路的数据。
  17. 根据权利要求14所述的多核芯片,连接至片外内存,其中所述第一内存层还包括第一物理区,生成有物理访问电路,所述第四内存层、所述第二核层及所述第二内存层生成有物理硅通孔,用以电性传导所述第一运算电路的运算结果至所述片外内存。
  18. 根据权利要求14所述的多核芯片,其中所述第一核层与所述第一内存层为面对面制程,所述第一内存层与所述第四内存层为背对背制程,所述第四内存层与所述第二核层为面对面制程,所述第二核层及所述第二内存层为面对背制程。
  19. 根据权利要求6所述的多核芯片,还包括第三内存层,包括第三内存区,生成有存储单元,用以暂存所述第一运算电路或所述第二运算电路的运算结果,其中,所述第三内存层位于所述第二内存层之下。
  20. 根据权利要求18所述的多核芯片,其中所述第三内存层还包括输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口。
  21. 根据权利要求18所述的多核芯片,连接至片外内存,其中所述第三内存层还包括物理区,生成有物理访问电路,用以电性传导所述第一运算电路及所述第二运算电路的运算结果至所述片外内存。
  22. 根据权利要求18所述的多核芯片,其中所述第一核层与所述第一内存层为面对面制程,所述第一内存层与所述第二核层为背对背制程,所述第二核层与所述第二内存层为面对面制程,所述第二内存层与所述第三内存层为面对背制程。
  23. 根据权利要求1至21所述任一项的多核芯片,其中各层以倒装芯片球栅格阵列方式封装。
  24. 根据权利要求1至21所述任一项的多核芯片,其中各层以CoWoS方式封装。
  25. 一种集成电路装置,包括根据权利要求1至21任一项所述的多核芯片。
  26. 一种板卡,包括根据权利要求24所述的集成电路装置。
  27. 一种制成多核芯片的方法,包括:
    生成第一核层,所述第一核层包括:
    第一运算区,生成有第一运算电路;以及
    第一晶粒对晶粒区,生成有第一收发电路;
    生成第二核层,所述第二核层包括:
    第二运算区,生成有第二运算电路;以及
    第二晶粒对晶粒区,生成有第二收发电路;
    其中,所述第一核层和所述第二核层纵向堆叠,所述第一运算电路及所述第二运算电路通过所述第一收发电路及所述第二收发电路进行层间数据传输。
  28. 根据权利要求26所述的方法,所述多核芯片连接至片外内存,所述方法还包括在所述第一核层和所述第二核层间生成内存层,所述内存层包括:
    内存区,生成有存储单元,用以暂存所述第一运算电路与所述第二运算电路的运算结果;
    输入输出区,生成有输入输出电路,用以作为所述多核芯片对外联系的接口;以及
    物理区,生成有物理访问电路,用以访问所述片外内存。
  29. 根据权利要求27所述的方法,其中所述生成内存层的步骤包括在所述内存层生成有硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  30. 根据权利要求26所述的方法,还包括:
    生成第一内存层,包括第一内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果;以及
    生成第二内存层,包括第二内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果;
    其中,所述第一核层、所述第一内存层、所述第二核层、所述第二内存层依序堆叠;
    其中所述生成第一内存层的步骤包括在所述第一内存层生成收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  31. 根据权利要求29所述的方法,还包括生成第三内存层,所述第三内存层包括第三内存区,生成有存储单元,用以暂存所述第一运算电路的运算结果,其中所述第三内存层位于所述第一核层之上。
  32. 根据权利要求30所述的方法,还包括生成第四内存层,所述第四内存层包括第四内存区,生成有存储单元,用以暂存所述第二运算电路的运算结果,其中所述第四内存层位于所述第一内存层与所述第二核层间,所述生成第四内存层的步骤包括在所述第四内存层生成收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。
  33. 根据权利要求29所述的方法,还包括生成第三内存层,包括第三内存区,生成有存储单元,用以暂存所述第一运算电路或所述第二运算电路的运算结果,其中所述第三内存层位于所述第二内存层之下。
PCT/CN2022/122372 2021-10-08 2022-09-29 多核芯片、集成电路装置、板卡及其制程方法 WO2023056875A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111172907.2A CN115966534A (zh) 2021-10-08 2021-10-08 多核芯片、集成电路装置、板卡及其制程方法
CN202111172907.2 2021-10-08

Publications (1)

Publication Number Publication Date
WO2023056875A1 true WO2023056875A1 (zh) 2023-04-13

Family

ID=85803920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122372 WO2023056875A1 (zh) 2021-10-08 2022-09-29 多核芯片、集成电路装置、板卡及其制程方法

Country Status (3)

Country Link
CN (1) CN115966534A (zh)
TW (1) TWI814179B (zh)
WO (1) WO2023056875A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870171A (zh) * 2005-05-25 2006-11-29 尔必达存储器株式会社 半导体存储装置
US20110119322A1 (en) * 2009-11-13 2011-05-19 International Business Machines Corporation On-Chip Networks for Flexible Three-Dimensional Chip Integration
US9886275B1 (en) * 2013-10-09 2018-02-06 Mellanox Technologies Ltd. Multi-core processor using three dimensional integration
CN113097198A (zh) * 2019-12-23 2021-07-09 爱思开海力士有限公司 层叠式半导体器件及其测试方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618117B2 (en) * 1997-07-12 2003-09-09 Silverbrook Research Pty Ltd Image sensing apparatus including a microcontroller
EP1883109B1 (en) * 2006-07-28 2013-05-15 Semiconductor Energy Laboratory Co., Ltd. Memory element and method of manufacturing thereof
JP6231489B2 (ja) * 2011-12-01 2017-11-15 ザ ボード オブ トラスティーズ オブ ザ ユニヴァーシティー オブ イリノイ プログラム可能な変化を被るように設計された遷移デバイス

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870171A (zh) * 2005-05-25 2006-11-29 尔必达存储器株式会社 半导体存储装置
US20110119322A1 (en) * 2009-11-13 2011-05-19 International Business Machines Corporation On-Chip Networks for Flexible Three-Dimensional Chip Integration
US9886275B1 (en) * 2013-10-09 2018-02-06 Mellanox Technologies Ltd. Multi-core processor using three dimensional integration
CN113097198A (zh) * 2019-12-23 2021-07-09 爱思开海力士有限公司 层叠式半导体器件及其测试方法

Also Published As

Publication number Publication date
TWI814179B (zh) 2023-09-01
CN115966534A (zh) 2023-04-14
TW202316921A (zh) 2023-04-16

Similar Documents

Publication Publication Date Title
TWI748291B (zh) 積體電路裝置、互連元件晶粒及積體晶片上系統的製造方法
US8710676B2 (en) Stacked structure and stacked method for three-dimensional chip
TWI681525B (zh) 具有一控制器及一記憶體堆疊之彈性記憶體系統
JP6260806B2 (ja) 両面ダイパッケージ
WO2018121118A1 (zh) 计算装置和方法
WO2023078006A1 (zh) 加速器结构、生成加速器结构的方法及其设备
WO2024159717A1 (zh) 一种可重构3d芯片及其集成方法
JP7349812B2 (ja) メモリシステム
US20230352412A1 (en) Multiple die package using an embedded bridge connecting dies
CN108241484B (zh) 基于高带宽存储器的神经网络计算装置和方法
WO2023056876A1 (zh) 纵向堆叠芯片、集成电路装置、板卡及其制程方法
WO2023056875A1 (zh) 多核芯片、集成电路装置、板卡及其制程方法
US20230281136A1 (en) Memory and Routing Module for Use in a Computer System
KR102629195B1 (ko) 패키지 구조, 장치, 보드 카드 및 집적회로를 레이아웃하는 방법
CN114036086B (zh) 基于三维异质集成的串行接口存储芯片
WO2023024562A1 (zh) 高速缓存内容寻址存储器和存储芯片封装结构
WO2022193774A1 (zh) 用于芯片的封装框架,加工方法及相关产品
CN117690893A (zh) 一种芯片和包括该芯片的产品
US20230343718A1 (en) Homogeneous chiplets configurable as a two-dimensional system or a three-dimensional system
CN115966517A (zh) 背对背堆叠的制程方法及其介质与计算机设备
WO2022242333A1 (zh) 具有CoWoS封装结构的晶片、晶圆、设备及其生成方法
CN117690808A (zh) 生产芯片的方法
US20240273040A1 (en) Multi-Stack Compute Chip and Memory Architecture
Richard Tying the System Together
CN116976411A (zh) 器件、芯片、设备、存算调度和多层神经网络训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877894

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 18698629

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 22877894

Country of ref document: EP

Kind code of ref document: A1