CN116360873A - Wafer-level chip and wake-up method of dormant calculation crystal grains thereof - Google Patents

Wafer-level chip and wake-up method of dormant calculation crystal grains thereof Download PDF

Info

Publication number
CN116360873A
CN116360873A CN202310348741.8A CN202310348741A CN116360873A CN 116360873 A CN116360873 A CN 116360873A CN 202310348741 A CN202310348741 A CN 202310348741A CN 116360873 A CN116360873 A CN 116360873A
Authority
CN
China
Prior art keywords
die
computing
grain
calculation
dies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310348741.8A
Other languages
Chinese (zh)
Inventor
潘岳
姜申飞
胡杨
李霞
朱小云
王立华
王磊
郝培霖
韩慧明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai AI Innovation Center
Original Assignee
Shanghai AI Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai AI Innovation Center filed Critical Shanghai AI Innovation Center
Priority to CN202310348741.8A priority Critical patent/CN116360873A/en
Publication of CN116360873A publication Critical patent/CN116360873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4418Suspend and resume; Hibernate and awake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computer Security & Cryptography (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Semiconductor Integrated Circuits (AREA)

Abstract

The invention discloses a wafer-level chip, which comprises a plurality of isomorphic computing grains arranged in an array, wherein each computing grain is configured to be capable of keeping continuous communication with the adjacent computing grain during power-on, so that the adjacent dormant computing grain can be awakened by the computing grain when needed, the time required by the system for awakening the dormant computing grain is reduced, and the overall efficiency of the chip is improved.

Description

Wafer-level chip and wake-up method of dormant calculation crystal grains thereof
Technical Field
The invention relates to the technical field of cloud computing, in particular to a wafer-level chip and a wake-up method of dormant computing crystal grains thereof.
Background
In order to save time and power consumption caused by data transfer between chips, wafer-scale chips (wafer-scale chips) are currently used in some high-performance computing, high-power computing and cloud computing tasks to realize super-strong computing capability. The wafer level chip includes a plurality of DIEs (DIE), wherein a portion of the DIEs are used for communication with the outside, which may be referred to as interface DIEs (IO-DIE), and the remaining DIEs are used for computation, which may be referred to as Compute DIEs (C-DIE).
Because the cloud wafer-level chip consumes very much power, during low task load, the HOST may control part of the computing DIE to enter sleep mode through the IO-DIE to reduce overall power consumption. And when the master control detects that the load of the currently running computing grain is insufficient, waking up the dormant computing grain through IO-DIE. This process is too dependent on the master scheduling of the amount of tasks that are already present and waits for the running program and data path to be long, so the time required to wake up the C-DIE in sleep mode is longer.
Disclosure of Invention
In view of some or all of the problems in the prior art, a first aspect of the present invention provides a wafer level chip, comprising:
a plurality of computing dies arranged in an array, wherein the computing dies are all of the same type of die, and each computing die is configured to be capable of continuous communication with its neighboring dies during power-up of the chip.
Further, the wafer level chip further includes:
an interface die communicatively connected with the computing die; and
and the power management and microcontroller module is used for power supply and task management of each calculation grain.
Further, the compute dies are communicatively connected by a die interconnect (D2D, die to Die connect) sub-module, wherein the die interconnect sub-module employs an extremely short serial interconnect (XSR) or uci protocol.
Further, the die interconnect sub-module, the power management and microcontroller module are disposed in an always on voltage domain.
Further, the computation grain includes a data reception BUFFER (RX-BUFFER) for receiving and storing data required for a task.
Further, the wafer level chip includes N interface dies, each interface die connected to any one of the compute dies in each row of compute dies.
The second invention provides a wake-up method of dormant computing crystal grains of the wafer-level chip, which comprises the following steps:
the method comprises the steps that a first calculation grain in a working state predicts a load trend within a specified duration, and when the load trend is equal to or greater than a threshold value, a second adjacent calculation grain in a dormant mode is awakened and reported to a master control;
the main control transmits data required by a second calculation grain to a data receiving buffer area of the second calculation grain through the first calculation grain, and meanwhile, the second calculation grain transmits an interrupt to a power management and microcontroller module to be electrified; and
and after the second calculation crystal grain is powered on, reading data from the data receiving buffer area and entering a working state.
Further, the method further comprises:
and carrying out periodic handshaking between the first computing grain and the second computing grain.
Further, waking up an adjacent second computing die in sleep mode includes:
the first computing die sends a fixed wakeup command to the second computing die through a die interconnect submodule.
Further, setting the designated time length through software; and/or
The specified duration is in the range of 3 to 5 time periods, wherein the time period is equal to the sum of the power-on and power-off durations.
According to the wafer-level chip and the awakening method of the dormant computing crystal grains thereof, the interconnection paths among the computing crystal grains are kept in a normal working mode, so that the adjacent dormant computing crystal grains can be awakened through the computing crystal grains when needed, the time required by the system for awakening the dormant computing crystal grains can be reduced, and the overall efficiency of the chip is improved.
Drawings
To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, for clarity, the same or corresponding parts will be designated by the same or similar reference numerals.
FIG. 1 is a schematic diagram of a wafer level chip according to one embodiment of the invention;
FIG. 2 is a schematic diagram showing the connection of two adjacent computing dies in a wafer level chip according to one embodiment of the present invention; and
fig. 3 is a flow chart illustrating a wake-up method of a dormant computing die in a wafer level chip according to an embodiment of the invention.
Detailed Description
In the following description, the present invention is described with reference to various embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention is not limited to these specific details. Furthermore, it should be understood that the embodiments shown in the drawings are illustrative representations and are not necessarily drawn to scale.
Reference throughout this specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that the embodiments of the present invention describe the steps of the method in a specific order, however, this is merely for the purpose of illustrating the specific embodiments, and not for limiting the order of the steps. In contrast, in different embodiments of the present invention, the sequence of each step may be adjusted according to the adjustment of the actual requirement.
In existing wafer-level chips, once a compute die enters a dormant state, it will be in a powered-down state and the communication links to the remaining compute die will also be interrupted. When a dormant computing die needs to be awakened, it needs to be powered up on the one hand, and on the other hand, the communication link with other computing dies needs to be re-established. However, a certain Training (Training) time and a certain calibration (calibration) time are required for reestablishing the communication link, and in addition, data transmission can only be performed after the reestablishment of the communication link is completed, so that the time required for waking up the dormant computing die is long in the prior art, which has a certain influence on the overall efficiency of the chip. Based on the above, in order to improve the chip efficiency and reduce the wake-up time of the dormant calculation crystal grains, the invention provides a wafer-level chip and a wake-up method of the dormant calculation crystal grains, which ensure that the communication link between the calculation crystal grains is always kept normally, so that the dormant crystal grains can be awakened by the adjacent crystal grains in a working state before the main control HOST without reestablishing the communication link.
In the present invention, the term "continuous communication" means that communication is maintained for a specified period of time, and communication may be continued outside of this period of time, or intermittently, or not.
The embodiments of the present invention will be further described with reference to the accompanying drawings.
Fig. 1 shows a schematic structure of a wafer level chip according to an embodiment of the invention. As shown in fig. 1, a wafer level chip includes a computing die 101, an interface die 102, and a power management (PMU, power Management Unit) and microcontroller module (MCU, micro Control Unit) 103.
Wherein the computing DIE 101 (C-DIE, computer DIE) is used to perform computing tasks, which are arranged in an array, comprising N rows and M columns, wherein N, M is a natural number. Each computing die is communicatively connected to its neighboring computing die, i.e., its previous computing die and next computing die in the same column, and the previous computing die and next computing die in the same row. It should be understood that in embodiments of the present invention, all of the calculated dies are isomorphic dies, i.e., all of the calculated dies are the same type of die.
Since in the wafer-level chip, the execution of the computing task usually adopts a pipeline manner, and the computing DIE in the dormant state is marked as a dormant computing DIE CO-DIE 111, as shown in fig. 1, the computing task is preferentially executed by the computing DIE C-DIE 101 in the first row and the first column, when the load is insufficient, the dormant computing DIE 111 in the second row and the second column is awakened, and so on, so that the last computing DIE in the same column and/or the previous computing DIE in the same column of the computing DIE of the next executable task are necessarily in a working state, and can be awakened by the last computing DIE in the same column and/or the previous computing DIE in the same column. Based on this, in order to shorten the wake-up time, during the power-up of the chip, the compute die remains in continuous communication with its neighboring compute die even after entering the sleep state. In one embodiment of the invention, the dormant computing grain and the computing grain in the adjacent working state always carry out periodic handshake, thereby ensuring the normal maintenance of the communication link clock.
In one embodiment of the invention, the compute dies are communicatively connected by a die interconnect (D2D, die to Die connect) sub-module. In one embodiment of the invention, the die interconnect submodule enables communication connections between computing dies based on the very short serial interconnect (XSR SerDes) or uci protocol. The serial interconnection technology (SerDes) adopts a differential signal transmission mode to realize high-speed data transmission, has the advantages of small IO number, long transmission distance, high speed and the like, and is widely applied to high-speed interconnection between systems or chips at present. The extremely short serial interconnection (XSR SerDes) refers to serial interconnection with smaller distance between a transmitting end and a receiving end, and the optical network forum-universal electrical interface specification (OIF-CEI 4.0) is specially used for interconnection between Dies, and has the characteristics of low power consumption, small area and flexible communication protocol. Uci 1.0 employs the high-speed serial computer expansion bus standard (PCIe) and the computer interconnect standard (CXL) as the low-power D2D interconnect physical layer (PHY), which is compatible with multiple protocols including PCIe, CXL, and Raw Mode. FIG. 2 is a schematic diagram illustrating the connection of two adjacent computing dies in a wafer level chip according to one embodiment of the present invention. FIG. 2 is a schematic diagram showing the connection of a computing DIE C-DIE with a dormant computing DIE CO-DIE in an operating state. As shown in fig. 2, in one embodiment of the present invention, the computation die includes a control module Controller, an interconnection physical layer PHY, and a data reception BUFFER RX BUFFER. The control module is used for executing the calculation task of the calculation grain, the interconnection physical layer is used for interconnection communication between the two calculation grains, and the data receiving buffer area is used for receiving and storing data required by the task. As shown in fig. 2, to ensure communication link clock retention, in one embodiment of the invention, the die interconnect sub-module, along with the power management and microcontroller module 103, is configured to be in an always-on voltage domain (always-on power domain).
The interface Die (IO-Die) 102 is communicatively coupled with the computing Die 101 for enabling communication connection of the computing Die with a HOST and/or other external chips, units, modules, etc. Since the chip operates in a pipelined fashion, in one embodiment of the present invention, N interface dies are provided in total, where N is the number of rows of the compute die array. As shown in fig. 1, each interface die is connected to any one of the computing dies in each row, preferably the first computing die in each row.
The power management (PMU, power Management Unit) and microcontroller module (MCU, micro Control Unit) 103 are in an always-on voltage domain (always-on power domain) for power and task management of each compute die, such as starting a power-up procedure of a dormant compute die after receiving an interrupt (interrupt).
Fig. 3 is a flow chart illustrating a wake-up method of a dormant computing die in a wafer level chip according to an embodiment of the invention. As shown in fig. 3, the method for waking up the dormant computing die of the wafer-level chip includes:
first, in step 301, a load trend is predicted. And when the first calculation crystal grain in the working state works, predicting the load trend of the first calculation crystal grain or the whole system, and once the load after the prediction of the specified duration is greater than or equal to a threshold value, entering a step 302 to wake up the dormant calculation crystal grain. In practical applications, the specified duration may be set according to the usage scenario and working condition of the product, but since the specified duration is too short and may cause frequent power-up and power-down current paths, under conditions such as a sudden decrease of load trend, the connected DIE closing process may be caused to affect the system stability, so in one embodiment of the present invention, the specified duration is preferably set in a range of 3 to 5 time periods, where the time period is equal to the sum of the power-up and power-down durations at one time. In one embodiment of the present invention, the specified duration may be set by software, and hardware provides a configuration register to set a time window; the method comprises the steps of carrying out a first treatment on the surface of the
At step 302, a wake-up command is sent. And when the predicted load trend is equal to or greater than a threshold value, the first computing grain wakes up the adjacent second computing grain in the sleep mode, and reports the main control HOST. In one embodiment of the present invention, the first computing die sends a fixed wake command to the second computing die via a die interconnect submodule;
next, in step 303, a power-up procedure is initiated. After the second computing grain receives the fixed wake-up command, sending an interrupt (interrupt) to an MCU of an always-on power domain to start a power-on program of the second computing grain;
at the same time, data is issued at step 304. After receiving the report of the first calculation grain, the master HOST issues the data required by the second calculation grain, and the second calculation grain may not be fully awakened at this time and therefore cannot communicate with the HOST, so that the data is sent to the first calculation grain, as described above, the communication link between the first calculation grain and the second calculation grain is ensured to be always normal through periodic handshaking, so that the first calculation grain can directly send the data to the data receiving buffer area of the second calculation grain after receiving the data, without waiting for the second calculation grain to be completely electrified; and
finally, in step 305, a computing task is performed. And after the second calculation crystal grain is electrified, reading data from the data receiving buffer area and entering a working state, and executing corresponding calculation tasks.
The wafer-level chip can wake up the adjacent dormant computing crystal grains by computing the crystal grains by keeping the interconnection paths among the computing crystal grains in a normal working mode when needed, so that the time required by the system for waking up the dormant computing crystal grains is reduced, and the overall efficiency of the chip is improved.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to those skilled in the relevant art that various combinations, modifications, and variations can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention as disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (10)

1. A wafer level chip, comprising:
a plurality of computing dies arranged in an array, wherein the computing dies are all of the same type of die, and each computing die is configured to be capable of continuous communication with its neighboring computing dies during power-up of the chip.
2. The wafer-level chip of claim 1, further comprising:
an interface die communicatively connected with the computing die and configured to enable a communicative connection of the computing die with a master and/or external chip, unit; and
a power management and microcontroller module configured to perform power and task management of the computing die.
3. The wafer-level chip of claim 1, wherein the compute dies are communicatively connected by a die interconnect sub-module, wherein the die interconnect sub-module uses a very short serial interconnect or uci protocol.
4. The wafer level chip of claim 3, wherein the die interconnect submodule is disposed in an always on voltage domain.
5. The wafer level chip of claim 1, wherein the compute die includes a data receiving buffer configured to receive and store data required for a task.
6. The wafer level chip of claim 1, comprising N interface dies, where N is the number of rows of the array of compute dies, each interface die connected to any one of the compute dies in each row.
7. A method of waking up a dormant computing die of a wafer level chip according to any one of claims 1 to 6, comprising the steps of:
the method comprises the steps that a first calculation grain in a working state predicts a load trend within a specified duration, and when the load trend is equal to or greater than a threshold value, a second adjacent calculation grain in a sleep mode is awakened and reported to a master control;
the main control transmits data required by a second calculation grain to a data receiving buffer area of the second calculation grain through the first calculation grain, and meanwhile, the second calculation grain transmits an interrupt to a power management and microcontroller module to be electrified; and
and after the second calculation crystal grain is powered on, reading data from the data receiving buffer area and executing tasks.
8. The wake-up method of claim 7, further comprising the step of:
and carrying out periodic handshaking between the first computing grain and the second computing grain.
9. The wake method of claim 7 wherein waking up an adjacent second computing die in sleep mode comprises the steps of:
the first computing die sends a fixed wakeup command to the second computing die through a die interconnect submodule.
10. The wake-up method of claim 7 wherein the specified duration is set by software; and/or
The specified duration is in the range of 3 to 5 time periods, wherein the time period is equal to the sum of the power-on and power-off durations.
CN202310348741.8A 2023-04-03 2023-04-03 Wafer-level chip and wake-up method of dormant calculation crystal grains thereof Pending CN116360873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310348741.8A CN116360873A (en) 2023-04-03 2023-04-03 Wafer-level chip and wake-up method of dormant calculation crystal grains thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310348741.8A CN116360873A (en) 2023-04-03 2023-04-03 Wafer-level chip and wake-up method of dormant calculation crystal grains thereof

Publications (1)

Publication Number Publication Date
CN116360873A true CN116360873A (en) 2023-06-30

Family

ID=86907640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310348741.8A Pending CN116360873A (en) 2023-04-03 2023-04-03 Wafer-level chip and wake-up method of dormant calculation crystal grains thereof

Country Status (1)

Country Link
CN (1) CN116360873A (en)

Similar Documents

Publication Publication Date Title
US8286011B2 (en) Method of waking processor from sleep mode
US7188263B1 (en) Method and apparatus for controlling power state of a multi-lane serial bus link having a plurality of state transition detectors wherein powering down all the state transition detectors except one
US8479028B2 (en) Techniques for communications based power management
US7366930B2 (en) System and method for successfully negotiating a slowest common link speed between a first and second device
CN100442204C (en) System-on-chip chip and its power consumption control method
CN102799550B (en) Based on the waking up of chip chamber high-speed interface HSIC, hot-plug method and equipment
US7467313B2 (en) Method for transmitting a power-saving command between a computer system and peripheral system chips
EP3215907B1 (en) Integrated system with independent power domains and split power rails for logic and memory
US7467308B2 (en) Method for transmitting the system command of a computer system
US9477293B2 (en) Embedded controller for power-saving and method thereof
CN101504565A (en) Method for awakening chip module
US10394309B2 (en) Power gated communication controller
CN106063304B (en) System and method for message-based fine-grained system-on-chip power control
CN106774808B (en) A kind of multistage low-power consumption administrative unit and its method of multi-core chip
US9612652B2 (en) Controlling power consumption by power management link
CN100410846C (en) Method for realizing real-time clock waking-up of notebook computer
US7469349B2 (en) Computer system and method of signal transmission via a PCI-Express bus
CN104750223B (en) Method and system for reducing memory access power consumption of multi-core terminal
CN113254216B (en) Edge computing module and power consumption control method thereof
CN116360873A (en) Wafer-level chip and wake-up method of dormant calculation crystal grains thereof
CN201698324U (en) Embedded system
CN103188736B (en) Based on the ANT node power power-economizing method of flow control
TWI831611B (en) Microcontroller and control method thereof
CN111722559B (en) Low-power-consumption processing method based on DSP and FPGA architecture
CN220651253U (en) Chip combined architecture of electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination