CN113515240A

CN113515240A - Chip computing device and computing system

Info

Publication number: CN113515240A
Application number: CN202111033168.9A
Authority: CN
Inventors: 左丰国; 江喜平; 郭一欣; 周骏
Original assignee: Xian Unilc Semiconductors Co Ltd
Current assignee: Xian Unilc Semiconductors Co Ltd
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2021-10-19

Abstract

The application discloses a chip computing device and a computing system, relates to the technical field of integrated chips, and can increase the bandwidth of memory access and reduce the power consumption generated by the memory access. A chip computing device, comprising: the memory array chip comprises a memory array, a first storage unit and a second storage unit, wherein the memory array is used for storing data; the operation array chip comprises an operation array, wherein the operation array comprises at least one fixed operation unit, and the fixed operation unit is used for realizing a fixed operation function; the programmable array chip comprises a programmable array, wherein the programmable array is used for dynamically scheduling the execution flow of at least one fixed operation unit to obtain a target operation function; any two of the storage array chip, the programmable array chip and the operation array chip are connected in a stacked mode through three-dimensional heterogeneous integration.

Description

Chip computing device and computing system

Technical Field

The present application relates to the field of integrated chip technology, and in particular, to a chip computing device and a computing system.

Background

To adapt the hardware to rapidly evolving computational network architectures, re-engineering computational power through programmable methods has become an important development in the art.

However, as the operation scale of machine learning increases, the memory access capacity and bandwidth of the conventional reconfigurable computing power unit increase sharply, the bandwidth of memory access is narrow, the power consumption overhead caused by memory access is large, and a memory wall is easily formed.

Disclosure of Invention

The embodiment of the application provides a chip computing device and a computing system, which can increase the bandwidth of storage access and reduce the power consumption generated by the storage access.

In a first aspect of embodiments of the present application, there is provided a chip computing device, including:

the memory array chip comprises a memory array, a first storage unit and a second storage unit, wherein the memory array is used for storing data;

the operation array chip comprises an operation array, wherein the operation array comprises at least one fixed operation unit, and the fixed operation unit is used for realizing a fixed operation function;

the programmable array chip comprises a programmable array, wherein the programmable array is used for dynamically scheduling the execution flow of at least one fixed operation unit to obtain a target operation function;

any two of the storage array chip, the programmable array chip and the operation array chip are connected in a stacked mode through three-dimensional heterogeneous integration.

In some embodiments, the number of target arithmetic functions is at least one.

In some embodiments, one of the memory array chip, the programmable array chip, and the arithmetic array chip comprises an external global memory access interface controller and an external global access bus, the external global memory access interface controller being connected to the external global access bus;

the memory array chip, the programmable array chip or the operation array chip comprises an internal global access bus, and the internal global access bus is used for connecting a functional array in the chip;

three-dimensional interconnection interfaces are arranged among adjacent chips in the storage array chip, the programmable array chip and the operation array chip;

the memory array chip, the programmable array chip and the operation array chip all comprise internal local memory access lines, and the internal local memory access lines are respectively connected with the three-dimensional interconnection interface and the functional array.

In some embodiments, the memory array chip, the programmable array chip, and the operational array chip each include external memory access lines that are connected to the three-dimensional interconnect interface and the functional array, respectively.

In some embodiments, a computation interconnection line is included in the operation array chip, and the computation interconnection line is respectively connected with the three-dimensional interconnection interface and the operation array.

In some embodiments, the memory array chip, the programmable array chip, and the operational array chip each include an active layer and an internal metal layer;

the active layer comprises the functional array and/or the external global storage access interface controller;

the internal metal layer comprises the external global access bus, and/or an internal local memory access line, and/or the internal global access bus, and/or an external memory access line, and/or a computation interconnection line.

In some embodiments, adjacent ones of the memory array chip, the programmable array chip, and the operational array chip are connected by a three-dimensional heterojunction structure;

the three-dimensional heterogeneous connection structure comprises the three-dimensional interconnection interface.

In some embodiments, the orthographic projection of the storage array on the programmable array chip overlays the programmable array.

In some embodiments, the programmable array chip includes a controller, the programmable array being coupled to the controller;

the controller is used for controlling the programmable array to dynamically schedule the execution flow of at least one fixed arithmetic unit to obtain the target arithmetic function; and/or the presence of a gas in the gas,

the controller is configured to control the programmable array to perform a current target operation function based on result data obtained by performing a previous target operation function.

In some embodiments, the number of chip layers of the memory array chip is at least two; and/or the presence of a gas in the gas,

the number of chip layers of the programmable array chip is at least two; and/or

The number of the chip layers of the operation array chip is at least two.

In some embodiments, any two of the memory array, the programmable array, and the operation array are disposed on the same layer of chip.

In some embodiments, at least one of the memory array chip, the programmable array chip, and the operational array chip includes level shifting circuitry.

In some embodiments, the memory array comprises one or a combination of at least two of static random access memory, dynamic random access memory, Flash memory, ferroelectric memory, phase change memory, magnetic memory, and varistor memory.

In some embodiments, the memory array chip includes at least one of a memory array die and a memory array wafer; and/or the presence of a gas in the gas,

the operation array chip comprises at least one of an operation array crystal grain and an operation array wafer; and/or the presence of a gas in the gas,

the programmable array chip comprises at least one of a programmable array crystal grain and a programmable array wafer.

In a second aspect of the embodiments of the present application, there is provided a computing system, including: a host system and a chip computing device as described in the first aspect;

an external outgoing interface is arranged on a chip provided with an external global storage access interface controller in the chip computing device;

the chip computing device is connected with the upper system through the external leading-out interface.

According to the chip computing device and the computing system provided by the embodiment of the application, the programmable array is used for dynamically scheduling the execution flow of at least one fixed operation unit to obtain a target operation function, and various different target operation functions can be obtained by dynamically scheduling the execution flow of a conventional fixed operation unit, so that the operation function of the chip computing device can be reconstructed. Any two of the storage array chip, the programmable array chip and the operation array chip are connected in a stacked mode through three-dimensional heterogeneous integration, the three-dimensional heterogeneous integration can form the advantages of super-large local bus bandwidth between adjacent chips and between cross chips, the bandwidth of storage access can be increased, and the power consumption of the storage access is reduced. Different from the traditional I/0 interface, the I/0 interface is an input/output interface, the programmable array and the operation array are respectively connected with the storage array through three-dimensional heterogeneous integration stacking, storage access can be completed in the chip computing device, high bandwidth and low power consumption can be realized, the operation burden of an upper system can be transferred to the chip computing device, the function of unloading the computing force of the upper system is achieved, the storage access efficiency is improved, and the power consumption is reduced.

Drawings

Fig. 1 is a schematic structural diagram of a chip computing device according to an embodiment of the present disclosure;

fig. 2 is a schematic partial cross-sectional structure diagram of a chip computing device according to an embodiment of the present disclosure;

fig. 3 is a schematic structural block diagram of a computing system according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions provided by the embodiments of the present specification, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations on the technical solutions of the embodiments of the present specification, and the technical features in the embodiments and examples of the present specification may be combined with each other without conflict.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The term "two or more" includes the case of two or more.

To adapt the hardware to rapidly evolving computational network architectures, re-engineering computational power through programmable methods has become an important development in the art. However, as the operation scale of machine learning increases, the memory access capacity and bandwidth of the conventional reconfigurable computing power unit increase sharply, the bandwidth of memory access is narrow, the power consumption overhead caused by memory access is large, and a memory wall is easily formed.

In view of this, embodiments of the present application provide a chip computing device and a computing system, which can increase the bandwidth of memory access and reduce the power consumption generated by memory access.

In a first aspect of the embodiments of the present application, a chip computing device is provided, and fig. 1 is a schematic structural diagram of the chip computing device provided in the embodiments of the present application. As shown in fig. 1, the chip computing device provided in the embodiment of the present application includes: memory array chip 100, programmable array chip 200, and operational array chip 300. The memory array chip 100 includes a memory array 110 for storing data; the operation array chip 300 comprises an operation array 310, wherein the operation array 310 comprises at least one fixed operation unit, and the fixed operation unit is used for realizing a fixed operation function; the programmable array chip 200 comprises a programmable array 210, wherein the programmable array 210 is used for dynamically scheduling the execution flow of at least one fixed operation unit to obtain a target operation function; any two of the memory array chip 100, the programmable array chip 200 and the operation array chip 300 are connected in a three-dimensional heterogeneous integration stacking mode.

The fixed arithmetic function of the fixed arithmetic unit includes, but is not limited to, any combination of a multiplier-adder, a multiplier, a ripple processor, a hash calculation, a coder/decoder, a digital signal processor, and a dedicated calculation circuit such as machine learning. The operation array 310 may further include a hardmac IP, which may be understood as an existing solidified effective operation unit (hardware device) to increase the computation density, and the present application is not limited specifically, so that the effective operation density may be adaptively increased, that is, the density of the computation device is increased, thereby increasing the types and number of the operation functions. The at least one hardmac IP may constitute a fixed arithmetic unit, which is not specifically limited in this application. The fixed arithmetic units are typically hardmac IPs of relatively small granularity, so as to be combined into a reconfigurable arithmetic array with certain versatility by the programmable arrays 210 on the programmable array chip 200.

The programmable array 210 may be an FPGA (field programmable gate array) or an FPGA (embedded field programmable gate array), and may implement dynamic scheduling of the execution flow of the fixed operation unit by using the programmability thereof, which is not specifically limited in this application. The programmable array 210 may also include other hardmac IPs for increasing the operational functionality of the programmable array 210, which is not specifically limited in this application. The programmable array 210 obtains a target operation function by dynamically scheduling an execution flow of at least one fixed operation unit, where the target operation function may be an operation function combination in which one or more fixed operation functions are executed in a specific order, and the target operation function may implement a more complex data processing process with combined operations. Illustratively, an arithmetic function that one target arithmetic function may implement is addition → multiplication → hash operation → encryption → decryption. The programmable array 210 obtains the target operation function by dynamically scheduling the fixed operation units, which means that the programmable array 210 implements reconfigurable operation conforming to the target operation function by scheduling the types and the sequences of various fixed operation units and data input to the fixed operation units based on programmable adjustment. An interconnection interface is arranged between the programmable array 210 and the operation array 310 to realize cross-chip three-dimensional heterogeneous integration laminated connection, and dynamic scheduling of the programmable array 210 to the operation array 310 is realized to form a programmable operation storage array.

For example, the programmable array 210 may dynamically schedule the execution flow of the fixed arithmetic unit based on an instruction sequence or a configuration file, and the instruction sequence and the configuration file may be pre-stored in the programmable array 210 or the storage array 110, so as to avoid the occurrence of an external storage access, and thus avoid increasing the power consumption caused by the external storage access. The instruction sequence and the configuration file may also be sent to the programmable array 210 through an upper system, and the type of the target operation function obtained by the programmable array 210 may be flexibly adjusted, which is not specifically limited in this application.

For example, the data stored in the storage array 110 may include a target instruction issued by the upper system, and the target instruction may include an instruction sequence; the data stored in the storage array 110 may include data to be processed issued by the upper system and result data obtained after the data to be processed is processed by the programmable array 210 and the operation array 310, the result data may also be used as input data of other target operation functions, and the upper system may read the result data from the storage array 110, which is not specifically limited in this application.

In the chip computing device provided in the embodiment of the present application, the programmable array 210 is configured to dynamically schedule an execution flow of at least one fixed operation unit to obtain a target operation function, and can dynamically schedule an execution flow of a conventional fixed operation unit to obtain a plurality of different target operation functions, so that the operation function of the chip computing device is reconfigurable. Any two of the memory array chip 100, the programmable array chip 200 and the operation array chip 300 are connected in a stacked manner through three-dimensional heterogeneous integration, and the three-dimensional heterogeneous integration can form the super-large bandwidth advantage between adjacent chips and between cross-chips, so that the bandwidth of memory access can be increased, and the power consumption of memory access can be reduced. Different from the traditional I/0 interface, the I/0 interface is an input/output interface, the programmable array 210 and the operation array 310 are respectively and directly connected with the storage array 110 in a signal stacking way through three-dimensional heterogeneous integration, storage access is completed in a chip computing device, high bandwidth and low power consumption are realized, the operation burden of an upper system is transferred to the chip computing device, the effect of unloading the computing force of the upper system is achieved, a computing unit is prevented from accessing operation data through a storage wall, the storage access efficiency is improved, and the power consumption is reduced.

In some embodiments, the number of target arithmetic functions is at least one. The target operation function is obtained by dynamically scheduling the execution flow of at least one fixed operation unit through the programmable array 210, the more the number of the target operation functions is, the more the calculation steps are, the higher the operation and storage access efficiency of the chip calculation device is, the more the chip calculation device shares the calculation power of the upper system, the storage access efficiency can be improved, and the power consumption can be reduced.

In some embodiments, one of the memory array chip, the programmable array chip, and the arithmetic array chip comprises an external global memory access interface controller and an external global access bus, the external global memory access interface controller being connected to the external global access bus; the memory array chip, the programmable array chip or the operation array chip comprises an internal global access bus, and the internal global access bus is used for connecting a functional array in the chip; three-dimensional interconnection interfaces are arranged among adjacent chips in the storage array chip, the programmable array chip and the operation array chip; the memory array chip, the programmable array chip and the operation array chip comprise internal local memory access lines, and the internal local memory access lines are respectively connected with the three-dimensional interconnection interface and the functional array. The functional array may be the memory array 110, the programmable array 210, or the operational array 310. The memory array chip 100, the programmable array chip 200 and the operation array chip 300, which are chips communicating with the outside, can be arranged at the top position of the chip computing device, and the top chip can be provided with an external global memory access interface controller and an external global access bus. The external global memory access interface controller is used for controlling connection between the external global access bus and an external device, so as to realize data interaction between the chip computing device and the external device, and the external device may be an upper system or other devices, which is not specifically limited in the present application. The internal global access bus is used for connecting the functional arrays in the chip, the internal local storage access line is respectively connected with the three-dimensional interconnection interface and the functional arrays, the external global access bus in the chip at the top layer can realize cross-chip data interaction through the three-dimensional interconnection interface and the internal local storage access line, and the internal global access bus can realize data interaction between the functional arrays in the chip.

Illustratively, when the chip computing device and an external device have memory access, an external global memory access interface controller controls to occupy an external global access bus for data transmission; when memory access is carried out among internal chips of the chip computing device, the external global memory access interface controller controls to occupy an external global access bus, an internal local memory access line and an internal global access bus, the internal local memory access line is used for connecting cross-chip functional arrays, the internal global access bus is used for connecting the functional arrays in the same layer of chips, and further the connection between the external global access bus and the cross-chip functional arrays is realized, so that data transmission between adjacent chips or cross-chip in the chip computing device can be realized, and time-sharing multiplexing of the external global access bus and the internal global access bus can be realized.

In some embodiments, the memory array chip, the programmable array chip and the operation array chip each include external memory access lines, and the external memory access lines are respectively connected with the three-dimensional interconnection interface and the functional array. The external storage access line can realize the connection of the functional arrays among different chips through a three-dimensional interconnection interface, the external storage access line can be multiplexed in a time-sharing mode, illustratively, when cross-chip external data access is carried out, cross-chip external storage access is realized by the external storage access line and an external storage access bus, and when cross-chip internal data access is carried out, cross-chip data exchange in a chip computing device can be realized by the external storage access line alone. It should be noted that the internal global access behavior and the behavior of the external global storage access interface controller respectively correspond to the internal computation of the chip computing device and the loading/reading of data from the upper system to the chip computing device, and these two behaviors do not occur simultaneously, that is, at a time other than when the upper system loads/reads data from the chip computing device, the external global access bus is used to connect with the external storage access line, so as to implement the cross-array data exchange of the operation array 310 and/or the programmable array 210.

Illustratively, when the chip computing device and an external device have memory access, an external global memory access interface controller controls to occupy an external global access bus for data transmission; when memory access is performed between internal chips of the chip computing device, the external global memory access interface controller releases control over the external global access bus through internal circuits such as a multiplexer, and the time-division multiplexing external memory access line is interconnected with the external global access bus released by the external global memory access interface controller, or the time-division multiplexing internal local memory access line is interconnected with the external global access bus released by the external global memory access interface controller, so that cross-array data exchange of the operation array 310 and/or the programmable array 210, that is, internal global access is realized.

In some embodiments, a computation interconnection line is included in the operation array chip, and the computation interconnection line is respectively connected with the three-dimensional interconnection interface and the operation array. The programmable array chip can also comprise a calculation interconnection line, and the calculation interconnection line can realize the connection between the operation arrays and can also realize the connection between the programmable array and the fixed operation array through a three-dimensional interconnection interface.

For example, the memory array chip 100 may be disposed on a top layer of a chip computing device as a chip for communicating with the outside, and the chip computing device with the structure of the memory array chip 100 → the programmable array chip 200 → the operation array chip 300 can meet most memory access requirements and establish reconfigurable time-sharing multiplexing of resources of an internal global access bus and an external global access bus. The operation array chip 300 can be arranged on the top layer of the chip computing device as a chip for communicating with the outside, and the chip computing device with the structure of the operation array chip 300 → the programmable array chip 200 → the memory array chip 100 can also meet most memory access requirements.

For applications that are sensitive to frequent and time-sensitive dynamic reconfiguration, the programmable array chip 200 may be disposed on a top layer of a chip computing device as a chip for communicating with the outside, and the density of I/O introduced from the outside device on the programmable array chip 200 may be obtained to be advantageous, for example, the programmable array chip 200 → the operation array chip 300 → the memory array chip 100 and the programmable array chip 200 → the memory array chip 100 → the operation array chip 300.

For the storage of the one-time loading, the input/output is less during the operation process, and the application of the external input/output can be directly realized from the operation array chip 300, such as database retrieval, and the operation array chip 300 can be arranged on the top layer of the chip computing device as a chip for communicating with the outside, such as the operation array chip 300 → the storage array chip 100 → the programmable array chip 200.

For applications where the memory access behavior from the operational array chip 300 to the memory array chip 100 is more fixed, especially for applications where the point-to-point memory access across the array area is more fixed, the density advantage of the memory access interconnect can be obtained by designing the operational array chip 300 in close proximity to the memory array chip 100, such as the memory array chip 100 → the programmable array chip 200 → the operational array chip 300.

Illustratively, the memory array 110 may include one or a combination of at least two of SRAM (static random access memory), DRAM (dynamic random access memory), Flash memory, FRAM (ferroelectric memory), PRAM (phase change memory), MRAM (magnetic memory), and RRA (M varistor memory), and the memory array 110 may be provided with a corresponding memory controller according to a memory type, which is not particularly limited in this application.

For example, the memory array chip 100 may further include any external global memory access Interface controller corresponding to the memory, such as an SRAM Interface controller, a JEDEC-DRAM Interface controller, a Flash Interface controller, an AXI (Advanced eXtensible Interface, a bus protocol) Interface controller, or other custom Interface protocol controller, for global memory access to the memory array 110 by an external device, where the external device may be a host system or other device, and the application is not particularly limited. The memory array chip 100 may also include an external global memory access interface controller to external global access buses of all memory arrays 110 of the memory array chip 100, including but not limited to NOC AXI AHB and the like, to enable external global memory accesses. The interconnection bit width from the external global memory access interface controller to the external global access bus of all memory arrays of the memory array chip 100 does not need to be close to the sum of the internal local memory access bit widths of the chip computing device (usually, the bit width is tens of thousands to hundreds of thousands), and the design implementation can refer to the bit width (tens of thousands to thousands) in the prior art, because only input and result data of operation (which can be a computing process or a processing process) generally pass through the external global access bus, the memory access amount of the external global access interface controller is smaller than the total memory access amount generated in a plurality of operation steps. The cross-chip connection of the memory array chip 100 can be made without passing through an I/O circuit by adopting the stacked connection of three-dimensional heterogeneous integration, because the driving, external level boosting (during output), external level reducing (during input), tri-state controller, ESD protection and surge protection circuit and the like provided by the I/O circuit in the prior art can be eliminated for the three-dimensional heterogeneous integration interconnection in the same package with short distance and small load.

The operation array 310 and/or the programmable array 210 and the storage array 110 form a programmable operation storage array through a three-dimensional interconnection interface, and cross-region storage access between the programmable operation storage arrays can be realized through an internal global access bus and/or a time-sharing multiplexing internal global access interface; the form of the internal global access bus includes, but is not limited to NOC AXI AHB, etc., and the application is not particularly limited; the bus width of the internal global access bus is designed according to the width number of the internal local storage access interface, but the sum of the widths of all the internal local storage access interfaces is not required to be achieved, because the operation step corresponding to the common target operation function can avoid the storage access across the programmable operation storage array as much as possible, namely the probability of the storage access across the region is lower than that of the storage access in the region, the internal global access bus and/or the time-sharing multiplexing internal global access interface provide data channels of the storage access across the programmable operation storage array with different starting points and destinations; the low practical application characteristics of cross-region storage access probability relative to the in-region storage access can be fully combined, the dynamic partial reconfiguration characteristic of the programmable routing network in the programmable array 210 and among the arrays can be utilized, a point-to-point internal global access bus additional channel (for example, bus bit width is increased) is established on the basis of the internal global access bus, and the programmable resources are dynamically recycled after the cross-region storage access is finished: the method is characterized in that an internal global access bus with variable bit width is realized according to the requirement of an operation step through the combination of an internal global access bus designed by a hard core IP and a reconfigurable point-to-point interconnection channel.

Illustratively, the internal global access bus with variable bit width is composed of an internal global access bus designed through a hardmac IP and a reconfigurable point-to-point interconnection channel, and may also be composed of all reconfigurable point-to-point interconnection channels. The dynamic reconfiguration time-sharing multiplexing of the external global access bus and the internal global access bus can be realized by combining application and architecture characteristics to reduce global interconnection overhead, wherein the application and architecture characteristics are as follows: the external global access bus and the internal global access bus can be transferred to the programmable array chip 200 of the chip in a cross-chip manner, and dynamic function recombination is carried out by utilizing the dynamic partial reconstruction characteristic of the programmable routing network in the programmable array 210 and among the arrays; the external global access bus and the internal global access bus are data channels for interconnecting the storage accesses of all the programmable operation storage arrays, and the physical distribution of the external global access bus and the internal global access bus is similar; in combination with typical computing application requirements, the high-load occupied time segments of the external global access bus and the internal global access bus are different: the former is intensive before and after the start of the operation, and the latter is used during the operation. Interconnecting part or all of the external global access bus and the internal global access bus, by means of programmable routing network settings within and between the programmable arrays 210; and combining the dynamic reconfiguration characteristics of the programmable routing network, and performing resource management on the combined reconfigurable global bus: when the bandwidth of an external global access bus is needed, part or all of the reconfigurable global bus resources are switched to the external global access bus for interconnection through dynamic reconfiguration; when the bandwidth of the internal global access bus is needed, part or all of the reconfigurable global bus resources are switched to the internal global access bus for interconnection through dynamic reconfiguration.

The chip computing device can be arranged in a top chip of the chip computing device as a chip for communicating with the outside, the top chip needs to be provided with an external global storage access interface controller and an external global access bus, time division multiplexing of the external global access bus and the internal global access bus can be achieved, and global interconnection overhead of the chip computing device can be reduced.

In some embodiments, the memory array chip, the programmable array chip and the operation array chip each include an active layer and an internal metal layer; the active layer comprises a functional array and/or an external global storage access interface controller; the internal metal layer includes an external global access bus, and/or an internal local memory access line, and/or an internal global access bus, and/or an external memory access line, and/or a compute interconnect line.

Adjacent chips in the storage array chip, the programmable array chip and the operation array chip are connected through a three-dimensional heterogeneous connection structure; the three-dimensional heterogeneous connection structure comprises a three-dimensional interconnection interface.

Fig. 2 is a schematic partial cross-sectional structure diagram of a chip computing device according to an embodiment of the present disclosure. Illustratively, the active layer may be the first active layer 150, the second active layer 250, or the third active layer 340; the inner metal layer may be the first inner metal layer 160, the second inner metal layer 260, or the third inner metal layer 350; the top metal layer may be the first top metal layer 170, the second top metal layer 270, or the third top metal layer 360. The three-dimensional heterojunction structure may be the first three-dimensional heterojunction structure 130 or the second three-dimensional heterojunction structure 230. The memory array chip 100 comprises a first substrate layer 140, a first active layer 150, a first inner metal layer 160 and a first top metal layer 170 which are arranged in sequence; the programmable array chip 200 may include a second substrate layer 240, a second active layer 250, a second internal metal layer 260, and a second top metal layer 270, which are sequentially disposed; the operational array chip 300 may include a third substrate layer 330, a third active layer 340, a third inner metal layer 350, and a third top metal layer 360, which are sequentially disposed. With reference to fig. 1 and fig. 2, a first three-dimensional heterogeneous connection structure 130 is disposed between the memory array chip 100 and the programmable array chip 200, and a second three-dimensional heterogeneous connection structure 230 is disposed between the programmable array chip 200 and the operation array chip 300. The first top metal layer 170 of the memory array chip 100 is disposed opposite to the second top metal layer 270 of the programmable array chip 200, and is disposed opposite to the memory array chip 100 and the programmable array chip 200; the third top metal layer 360 of the operational array chip 300 is disposed opposite to the second substrate layer 240 of the programmable array chip 200, and is disposed opposite to the programmable array chip 200 and the operational array chip 300, which is not limited in this application.

It should be noted that, as shown in fig. 1, the memory array chip 100 further includes a first structure 120, the programmable array chip 200 includes a second structure 220, and the operation array chip 300 includes a third structure 320, the first structure 120 and the second structure 220 respectively correspond to the first three-dimensional hetero-connection structure 130, and the second structure 220 and the third structure 320 respectively correspond to the second three-dimensional hetero-connection structure 230.

The storage array chip 100 in the chip computing device shown in fig. 2 is used as a top chip, an outermost interface layer 400 may be disposed on a side of the storage array chip 100 away from the programmable array chip 200, and the outermost interface layer 400 may protect the chip computing device and lead out an external interface (PAD/BUMP) of the chip computing device. The first active layer 150 includes the memory array 110 and an external global memory access interface controller 151. The first internal metal layer 160 includes a plurality of metal connections, and the metal connections within the first internal metal layer 160 may include an external global access bus A, a first internal local memory access line B1, a second internal local memory access line B2, and an external memory access line C; the first top metal layer 170 includes a plurality of first connection lines 171; the first three-dimensional heterogeneous connection structure 130 comprises a three-dimensional interconnection interface E, the first internal local storage access line B1 is connected with the first connection line 171, the first connection line 171 is connected with the three-dimensional interconnection interface E, the connection between the storage array 110 and the programmable array 210 can be realized through the first internal local storage access line B1 and the three-dimensional interconnection interface E, and the connection of the functional array across the chip can be realized through the three-dimensional interconnection interface E and the first internal local storage access line B1. The second internal local memory access line B2 enables connection of devices within the functional chip. The external storage access line C may be connected to the storage array 110 and the global access bus a, respectively, and the external storage access line C is further connected to the three-dimensional interconnect interface E, so that time-sharing of the internal storage access and the external storage access may be performed by using time-sharing multiplexing of the external storage access line C in combination with the external global access bus a. The second top metal layer 270 may include a plurality of second connection lines 271; the second internal metal layer 260 may include a plurality of metal connection lines, the metal connection lines within the second internal metal layer 260 may include an internal global access bus D, a first internal local memory access line B1, a second internal local memory access line B2, an external memory access line C, and a first compute interconnect line G1, the metal connection lines within the first internal metal layer 160 may also include an internal global access bus, and the metal connection lines within the third internal metal layer 350 may also include an internal global access bus. The second active layer 250 includes programmable arrays 210 and the internal global access bus D may enable interconnection between the programmable arrays 210. The chip computing device may further include a plurality of vias 500, the vias 500 may extend through the substrate layers and the active layer, as shown in fig. 2, the vias 500 may extend through the first substrate layer 140 and the first active layer 150, the vias 500 also extending through the second substrate layer 240 and the second active layer 250. The via 500 is used to connect metal lines at both ends of the via 500. The third top metal layer 360 includes a plurality of third connection lines 361; the second three-dimensional hetero-connection structure 230 includes a three-dimensional interconnect interface E for connecting the operation array and the programmable array 210, and the third active layer 340 includes the operation array 310. The third internal metal layer 350 includes a first computational interconnect G1 for connecting the three-dimensional interconnect interface E and the operational array 310 for cross-chip interconnection, and a second computational interconnect G2 for connecting devices or lines within the operational array 310.

Illustratively, the external global access bus a and the external memory access line C in the first internal metal layer 160 are interconnected, and then the external global access bus a is connected to the programmable array 210 across a chip through the three-dimensional interconnect interface E and the external memory access line C in the second internal metal layer 260 connected to the three-dimensional interconnect interface E. Through the memory array 110, the external global access bus a is connected with the first internal local memory access line B1 in the first internal metal layer 160, through the three-dimensional interconnect interface E and the first internal local memory access line B1 in the second internal metal layer 260 connected with the three-dimensional interconnect interface E, the external global access bus a can be connected to the programmable array 210 across a chip, then the connection between different programmable arrays 210 is realized through the internal global access bus D, and finally, the external memory access or the internal memory access of the chip computing device is realized. The first calculation interconnection line G1 in the second internal metal layer 260 is used to connect the programmable array 210 and the three-dimensional interconnection interface E, and the cross-chip connection between the programmable array 210 and the operation array 310 and the internal memory access of the chip calculator can be realized through the first calculation interconnection line G1 in the second internal metal layer 260, the three-dimensional interconnection interface E and the first calculation interconnection line G1 in the third internal metal layer 350. The third connecting line 361 may be used to connect the three-dimensional interconnect interface E and the first calculation interconnect line G1.

It should be noted that the connection between the memory array 110 and the programmable array 210 can be a one-to-one connection, a many-to-one connection, or a one-to-many connection, and the present application is not limited in particular. The structure of the chip computing device shown in fig. 2 is merely illustrative and not intended to be a specific limitation of the present application.

The chip computing device provided by the embodiment of the application can be arranged at the top layer of the chip computing device as a chip for communicating with the outside, and the chip at the top layer is provided with an external global memory access interface controller 151 and an external global access bus chip computing device.

In some embodiments, the orthographic projection of the memory array on the programmable array chip overlays the programmable array. The covering here may be all or part of the covering, and the present application is not particularly limited.

Illustratively, the programmable arrays 210 are dispersedly distributed at the vertical projection of the physical location of the corresponding memory array 110, and a high-bandwidth interconnect, i.e. an internal local memory access line B, is established through three-dimensional heterogeneous integration, and generally, the bit width of the internal local memory access line B of each group may be several thousands to several tens of thousands, and the sum of the bit widths is several tens of thousands to several hundreds of thousands, so as to form a programmable operation memory array of a dispersedly distributed memory access structure.

In the chip computing device provided by the embodiment of the application, the programmable arrays 210 are connected with the storage arrays 110 in a one-to-one correspondence manner, and one programmable array 210 and one storage array 110 can form a storage array, so that the bit width of an internal local storage access interface can be increased, high-bandwidth interconnection is realized, and the storage access power consumption is reduced.

In some embodiments, the programmable array chip includes a controller, such as a CPU/MCU, etc., not specifically limited in this application, and the programmable array is connected to the controller; the controller is used for controlling the programmable array to dynamically schedule the execution flow of at least one fixed operation unit to obtain the target operation function; and/or the controller is used for controlling the programmable array to execute the current target operation function based on the result data obtained by executing the last target operation function. It should be noted that the programmable array can execute the current target operation function based on the result data obtained by executing the previous target operation function, the initial operation data, or the result data obtained by executing other target operation functions, and the application is not limited in particular.

Illustratively, each programmable array 210 or a plurality of programmable arrays 210 is provided with a controller: the programming result of the programmable array 210 can be stored in a CRAM (configurable ram) in the programmable array 210, and the characteristic that the CRAM disappears when power is off needs to be implemented by the controller to load the programming file from the outside of the programmable array 210. The controller may also be responsible for boundary scanning of the programmable array 210, online data observation and loading, etc.; the controller may support dynamic partially reconfigurable functions to switch the functions of the partially programmable array 210 during execution of operations: for example, after the execution process of one target operation function is finished, the corresponding programmable array 210 is dynamically reconfigured into the next target operation function, and the result data obtained by executing the previous target operation function (partially or completely stored in the corresponding storage array 110) is inherited for implementing the current target operation function. The controller may be implemented by a processor such as a CPU, and the present application is not particularly limited.

The chip computing device provided by the embodiment of the application controls the programmable array 210 to reconstruct the target operation function and execute the target operation function by setting the controller.

The number of chip layers of the operation array chip is at least two.

The two layers of storage array chips can be connected through a three-dimensional heterogeneous connection structure, the two layers of programmable array chips can be connected through a three-dimensional heterogeneous connection structure, and the two layers of operation array chips can be connected through a three-dimensional heterogeneous connection structure.

The chip computing device provided by the embodiment of the application can be set to be of a multilayer chip structure according to budget requirements or storage requirements so as to meet more operation and storage access requirements. Specifically, the number of layers of the memory chip may be set to at least two layers for a large demand of memory access capacity; aiming at the large demand of the operation amount, the layer number of the operation chip can be set to be at least two layers; the number of layers of the programmable chip can be set to be at least two layers according to the larger requirement of the target operation function type.

In some embodiments, any two of the memory array, the programmable array, and the operation array are disposed on the same layer of the chip.

The chip computing device provided by the embodiment of the application can combine any two chips into one layer of chip, and can particularly meet the condition that the demand on computing density is low.

Illustratively, the metal interconnection for three-dimensional heterogeneous integration is directly performed when the core voltages of the memory array chip 100 and the programmable array chip 200 and the operation array chip 300 are the same; when the core voltages of the memory array chip 100 and the programmable array chip 200 and the operational array chip 300 are different, it is necessary to design a level shifter circuit, which can be designed on the memory array chip 100, generally on or near the three-dimensional heterogeneous integrated bonding region, and the present application is not limited in particular.

Illustratively, when the core voltages of the memory array chip 100 and the programmable array chip 200 are different, a level shift circuit is arranged on the memory array chip 100, and the level shift circuit can also be transferred to the programmable array chip 200 in combination with three-dimensional heterogeneous integration across the chip, generally on or near a three-dimensional heterogeneous integration bonding region; when the core voltages of the programmable array chip 200 and the operation array chip 300 are the same, metal interconnection of three-dimensional heterogeneous integration is directly carried out; when the core voltages of the programmable array chip 200 and the operational array chip 300 are different, a level shift circuit needs to be arranged on the programmable array chip 200, usually on or near a three-dimensional heterogeneous integrated bonding region.

The chip computing device provided by the embodiment of the application can realize the condition that the core voltages of different chips are different by arranging the level conversion circuit, and can still realize cross-chip storage access.

In some embodiments, a memory array chip may include at least one of a memory array die and a memory array wafer; and/or the presence of a gas in the gas,

the operation array chip can comprise at least one of an operation array crystal grain and an operation array wafer; and/or the presence of a gas in the gas,

the programmable array chip may include at least one of a programmable array die and a programmable array wafer.

It should be noted that the chip may be at least one of a die (die or chip) and a wafer (wafer), but not limited thereto, and may be any alternative conceivable by those skilled in the art. The wafer refers to a silicon wafer used for manufacturing a silicon semiconductor circuit, and the chip or the crystal grain refers to a silicon wafer obtained by dividing the wafer on which the semiconductor circuit is manufactured. The specific embodiments of the present application are described by taking a chip as an example.

In a second aspect of embodiments of the present application, there is provided a computing system, including: a host system and a chip computing device as described in the first aspect; an external leading-out interface is arranged on a chip provided with an external global storage access interface controller in the chip computing device; and the chip computing device is connected with the upper system through the external leading-out interface.

For example, fig. 3 is a schematic structural block diagram of a computing system provided in an embodiment of the present application. As shown in fig. 3, a computing system provided in an embodiment of the present application includes: a host system 2000 and a chip computing device 1000 as described in the first aspect; with reference to fig. 2, an external outgoing interface F is provided on a chip of the chip computing device 1000 on which the external global storage access interface controller 151 is provided; the chip computing device 1000 is connected to the upper system 2000 through an external lead interface F. As shown in fig. 2, the external lead-out interface F is connected to a connection line of an upper system through a through-silicon via 500, and is connected to the external global storage access interface controller 151.

According to the computing system provided by the embodiment of the application, the programmable array is used for dynamically scheduling the execution flow of at least one fixed operation unit to obtain a target operation function, and various different target operation functions can be obtained by dynamically scheduling the execution flow of the conventional fixed operation unit, so that the operation function of the chip computing device can be reconstructed. Any two of the storage array chip, the programmable array chip and the operation array chip are connected in a stacked mode through three-dimensional heterogeneous integration, the three-dimensional heterogeneous integration can form the advantages of super-large local bus bandwidth between adjacent chips and between cross chips, the bandwidth of storage access can be increased, and the power consumption of the storage access is reduced. Different from the traditional I/0 interface, the I/0 interface is an input/output interface, the programmable array and the operation array are respectively connected with the storage array through three-dimensional heterogeneous integration stacking, storage access can be completed in the chip computing device, high bandwidth and low power consumption can be realized, the operation burden of an upper system can be transferred to the chip computing device, the function of unloading the computing force of the upper system is achieved, operation and storage access efficiency is improved, and power consumption is reduced.

While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

Claims

1. A chip computing device, comprising:

2. The chip computing device of claim 1, wherein the number of target arithmetic functions is at least one.

3. The chip computing device of claim 2, wherein one of the memory array chip, the programmable array chip, and the arithmetic array chip comprises an external global memory access interface controller and an external global access bus, the external global memory access interface controller being connected to the external global access bus;

4. The chip computing device of claim 3, wherein the memory array chip, the programmable array chip, and the operational array chip each include external memory access lines that are connected to the three-dimensional interconnect interface and the functional array, respectively.

5. The chip computing device of claim 4, wherein a compute interconnect is included within the compute array chip, the compute interconnect connecting the three-dimensional interconnect interface and the compute array, respectively.

6. The chip computing device of claim 5, wherein the memory array chip, the programmable array chip, and the operational array chip each include an active layer and an internal metal layer;

the internal metal layer includes the external global access bus, and/or an internal local memory access line, and/or the internal global access bus, and/or the external memory access line, and/or the compute interconnect line.

7. The chip computing device of claim 3, wherein adjacent ones of the memory array chip, the programmable array chip, and the operational array chip are connected by a three-dimensional heterojunction structure;

8. The chip computing device of claim 1, wherein an orthographic projection of the storage array on the programmable array chip overlays the programmable array.

9. The chip computing device of claim 1, wherein the programmable array chip includes a controller, the programmable array being connected to the controller;

10. The chip computing device of claim 1, wherein the memory array chip has at least two layers; and/or the presence of a gas in the gas,

The number of the chip layers of the operation array chip is at least two.

11. The chip computing device of claim 1, wherein any two of the memory array, the programmable array, and the operational array are disposed on a same layer of a chip.

12. The chip computing device of claim 1, wherein at least one of the memory array chip, the programmable array chip, and the operational array chip includes level shifting circuitry.

13. The chip computing device of claim 1, wherein the memory array comprises one or a combination of at least two of static random access memory, dynamic random access memory, Flash memory, ferroelectric memory, phase change memory, magnetic memory, and varistor memory.

14. The chip computing device of any of claims 1-13, wherein the memory array chip comprises at least one of a memory array die and a memory array wafer; and/or the presence of a gas in the gas,

15. A computing system, comprising: a host system and a chip computing device according to any one of claims 1-14;