CN115617739B - Chip based on Chiplet architecture and control method - Google Patents

Chip based on Chiplet architecture and control method Download PDF

Info

Publication number
CN115617739B
CN115617739B CN202211183717.5A CN202211183717A CN115617739B CN 115617739 B CN115617739 B CN 115617739B CN 202211183717 A CN202211183717 A CN 202211183717A CN 115617739 B CN115617739 B CN 115617739B
Authority
CN
China
Prior art keywords
core particle
calculation
computing
computing system
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211183717.5A
Other languages
Chinese (zh)
Other versions
CN115617739A (en
Inventor
张加宏
韩国庆
徐俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202211183717.5A priority Critical patent/CN115617739B/en
Publication of CN115617739A publication Critical patent/CN115617739A/en
Application granted granted Critical
Publication of CN115617739B publication Critical patent/CN115617739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package

Abstract

The invention discloses a chip based on a Chiplet architecture, which comprises a CPU core particle, a tube shell, an IO core particle, a first computing system, a second computing system, a silicon adapter plate and a substrate, wherein the CPU core particle is connected with the tube shell; the first computing system and the second computing system have the same structure and respectively comprise two storage units and two computing cores; the CPU core particle (1) performs data interaction with the IO core particle, the first computing system and the second computing system through the UCie bus respectively. The control method of the invention comprises the following steps: in the periodic working phase, the periodic pulse signals between the first computing system and the second computing system are synchronous with each other, and the periodic pulse signals in each computing system are mutually detected. The invention adopts the periodic pulse signal to realize the synchronization and detection of the calculated core particles in the whole chip, and simultaneously realizes the control of the performance and the power consumption of the whole chip.

Description

Chip based on Chiplet architecture and control method
Technical Field
The present invention relates to chips with a Chiplet architecture, and more particularly, to a chip based on a Chiplet architecture and a control method thereof.
Background
System on Chip, soC for short, i.e., system on Chip. In a narrow sense, the information system is integrated by a chip of a core of the information system, and key components of the system are integrated on one chip; in a broad sense, a SoC is a miniature system that, if the Central Processing Unit (CPU) is the brain, is a system that includes the brain, heart, eyes, and hands. SoCs are generally defined as integrating a microprocessor, analog IP core, digital IP core, and memory (or off-chip memory control interface) on a single chip, which is typically custom-made or standard product for a particular application. The SoC emphasizes the integration of as much IP as possible into one chip. The advantages are as follows: the overall area of the integrated circuit will be smaller, the power consumption will be lower, and the reliability will be higher. And each IP is connected by a data bus, so that the information transmission efficiency is higher. Although SoC designed chips have numerous advantages, their disadvantages are also apparent, as follows:
1) The period from the design to mass production of the SoC is longer, generally about 12 months;
2) The design verification link period of the SoC is longer, and generally occupies 70% of the whole period;
3) IP authorization and compatibility can greatly affect time to market;
4) The cost of advanced manufacturing processes increases exponentially;
5) Small lot products, such as aviation-like devices, soC is not the best choice because of the steep increase in cost;
6) The memory and IO part in the SoC chip occupies most of the area of the chip, so that the improvement of CPU performance is little along with the improvement of the chip manufacturing process.
In order to solve the above-mentioned disadvantages of SoC, the Chiplet technology has been developed, which is characterized in that:
1) The Chiplet can greatly improve the yield of the large chip;
2) Chiplet can reduce the complexity and cost of design;
3) Chiplet also reduces the cost of chip fabrication;
4) The chip greatly shortens the period of chip research and development;
5) Chiplet can improve performance.
It is contemplated that not all circuits need to be designed and fabricated with advanced nodes and that not all circuits on the same chip can benefit from scaling. In this case, a larger chip is decomposed into a plurality of smaller chips, and the cost of mixing and matching according to the need is lower, and the chip mode with higher yield is generated, and a plurality of chips with different process nodes are integrated in one chip.
Most of the existing chips do not have a secure computing function, that is, the function of taking two or one cannot be realized in the chip. In the prior art, if other core particles in the chip fail, the whole chip may be halted.
Disclosure of Invention
The invention aims to: the invention aims to provide a high-performance and high-safety computing chip which is used for solving the problems of a large server, automatic driving and other application scenes needing high bandwidth and high safety.
The technical scheme is as follows: the chip of the Chiplet architecture comprises a CPU core particle, a tube shell, an IO core particle, a first computing system, a second computing system, a silicon adapter plate and a substrate; the first computing system and the second computing system have the same structure and respectively comprise two storage units and two computing cores; each storage unit is respectively arranged on one calculation core particle, the storage units are connected with the calculation core particle through first TSV through holes, and the first TSV through holes are connected with RDL through micro-bumps; the CPU core particle, the calculation core particle and the IO core particle are respectively connected with the silicon adapter plate through micro-convex points; the silicon adapter plate is connected with the substrate through a second TSV through hole and a copper bump, and a solder ball is arranged at the bottom of the substrate; the CPU core particle, the IO core particle, the storage unit, the calculation core particle, the silicon adapter plate and the base plate are all arranged in the tube shell;
the IO core particle and the CPU core particle are respectively transmitted with configuration parameters to the first computing system and the second computing system through the UCIe bus for data interaction.
Further, each storage unit comprises four identical storage core grains, the four storage core grains are sequentially stacked on one calculation core grain, and each storage core grain is connected with the calculation core grain through a first TSV through hole.
Further, the first computing system comprises two storage units, a first computing core particle and a second computing core particle, wherein the first computing core particle generates a periodic pulse signal and detects the periodic pulse signal forwarded by the second computing core particle, and the second computing core particle is responsible for checking and forwarding the periodic pulse signal to the first computing core particle; at the same time, the first calculation core particle synchronizes the periodic pulse signal input by the second calculation system.
A chip control method based on a Chiplet architecture, wherein the chip comprises the following steps of: an initialization phase and a periodic working phase; after power-on, the chip is in an initialization stage, and at the moment, the CPU core particle, the calculation core particle, the IO core particle and the storage core particle are initialized;
after initialization is completed, the CPU core particle sends configuration parameters to each piece of calculation core particle through the UCie bus, after the CPU core particle receives the return codes of all calculation core particle configuration completion, the initialization of the whole chip is completed, and then the chip starts to perform a periodic working stage.
Further, in the periodic working phase, periodic pulse signals between the first computing system and the second computing system are mutually synchronized, and the periodic pulse signals in each computing system are mutually detected;
if the periodic pulse signal is detected to be abnormal in a certain computing system, the CPU core particle closes a computing channel of the system according to the error reporting position; if the periodic pulse signals of both calculation systems are abnormal, the CPU core (1) will report errors.
Further, during periodic operation, there are four modes of operation: performance mode, security mode, single-system mode, and low power mode;
performance mode: the four pieces of calculation core particles are all in a working state and are used for calculating a large amount of data;
safety mode: when the method works, four pieces of calculation core particles are divided into two calculation systems, and the two calculation systems respectively calculate the same data; then, the calculation results of the two calculation systems are transmitted back to the CPU core particle and are compared, and if the calculation results are the same, the CPU core particle sends out the calculated results; if the calculation results are different, the CPU core particle judges the abnormal calculation system according to the detection result of the periodic signal by the calculation core particle, and then the data of the normal working calculation system is sent out;
single-line mode: when in work, only one computing system works, and the other computing system is in a sleep state; the computing system in the sleep state only keeps the synchronization and detection functions, and the rest functions are all closed;
low power consumption mode: only one computing core particle will work, and the rest computing core particles are in a sleep mode; the calculation core particle in sleep mode only keeps the pulse signal synchronization and detection functions, and other functions are all closed
Compared with the prior art, the invention has the following remarkable effects:
1. compared with the chip of the existing Chiplet architecture, the chip provided by the invention adopts the periodic pulse signals to realize the synchronization and detection of the calculated core particles in the whole chip, and simultaneously realizes the control of the performance and the power consumption of the whole chip;
2. compared with other similar functional chips, the two calculation core particles in any calculation system can be mutually detected, and if one calculation core particle fails, the system can be closed in a protective way, but the normal operation of other calculation systems in the chip is not influenced;
3. compared with the traditional SOC technology, the chip of the invention adopts a chip architecture mode, adopts core grains with mature design, and greatly shortens the time to market; because the chip adopts four pieces of calculation core particles, the chip can be used for processing large-scale calculation scenes, and a large amount of cost is saved while the performance is improved.
Drawings
FIG. 1 is a schematic diagram of a chip Chiplet architecture of the present invention;
FIG. 2 is a schematic diagram of an integrated structure of a chip according to the present invention;
FIG. 3 is a diagram showing the synchronization and detection of 5ms periodic pulses between the computing cores according to the present invention;
FIG. 4 is a chip workflow diagram of the present invention;
FIG. 5 is a schematic diagram of the 5ms periodic pulse signal of FIG. 3;
FIG. 6 is a schematic diagram of CPU core and compute and IO core communication in an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
In order to solve the cost problem of the prior advanced SoC manufacture, the invention provides a heterogeneous integrated chip, and the advanced packaging is adopted to package the chips with different process procedures together. Among these cores, the uci bus is adopted as a communication scheme among the cores, and the low latency and high speed ensure the reliability of data transmission among the cores. The chip adopting the heterogeneous integration mode can greatly reduce the research and development cost and shorten the time to market.
As shown in fig. 1, the present invention is a special computing chip based on a Chiplet architecture, which includes a CPU core, four computing cores, four memory cells, and an IO core; each storage unit is formed by stacking four storage core grains. The CPU core grain is not directly connected with the storage unit, but performs data interaction with the storage unit through calculating the core grain. Therefore, the cost of a storage unit which is interacted with the CPU core particle independently can be saved, and the area and the packaging cost of the whole chip can be saved. Because the UCie bus is adopted between the calculation core grain and the CPU core grain, the method has the advantages of low time delay, high reliability and the like, and therefore the interaction speed between the CPU core grain and the storage unit is not affected. The chip of the invention adopts CoWoS (Chip on Wafer on Substrate) packaging technology to package the core grains together, wherein a layer of silicon chip adapter plate (Interposer) is added on a bottom plate (Substrate) for layout and wiring among the core grains, then each core grain is arranged on the layer of silicon chip adapter plate, and finally other packaging and heat dissipation materials are injected to complete the whole chip packaging.
The main functions of the calculated core particle are as follows: a1 The encryption and decryption of the data, the encryption algorithm built-in the calculation core particle encrypts the original data transmitted by the CPU core particle, and the decryption algorithm built-in the calculation core particle is responsible for analyzing the encrypted data transmitted by the CPU core particle; a2 A) realizing the synchronization and detection of the periodic pulse signals; a3 Storage compression, and calculating a compression algorithm with built-in core particles to compress data.
The main functions of the CPU core particle are as follows: b1 The CPU core particle is responsible for analyzing user instructions and calculating core particle interaction data, and the working mode of the core particle is intelligently distributed according to the user instructions; b2 Initializing the chip, wherein the CPU core particles are responsible for issuing and collecting initialization configuration parameters of each core particle in the initialization stage of the chip; b3 Data processing, wherein the CPU core grain can be segmented and assembled according to the current working mode in the period working process to calculate the data of the core grain.
The IO core particle has the following main functions: c1 The IO core particle is used for receiving data transmitted by an external pin of the chip and simultaneously transmitting the internal data of the chip to the outside through the external pin; c2 Serial-to-parallel conversion of chip data, since the IO die uses uci bus to communicate with the CPU die, serial-to-parallel conversion of data is to be realized.
The overall packaging effect is as shown in fig. 2, and the chip internal integrated structure comprises: CPU die 1, micro bump 2, package 3, IO die 4, TSV via 51, memory die 6, compute die 7, interposer 8, RDL (horizontal routing layer) 9, solder balls 10, substrate 11, copper bump 21. The four memory core grains 6 are respectively connected with one calculation core grain 7 through the first TSV through holes 51, and the first TSV through holes 51 are connected with the RDL 9 through the micro-convex points 2, so that the one calculation core grain 7 and the four memory core grains 6 are combined together, and a large amount of chip area is saved. As shown in fig. 3, four pieces of computation cores 7 are connected to each other by physical layer (PHY) leads, and four pin pins are required to be led out from each computation core 7 for synchronization and detection of periodic pulse signals, respectively. Taking the first calculation core particle in the first calculation system as an example, the first calculation core particle draws four wires: one for sending periodic pulse signals of the family to the opposite family (i.e. the second computing family), one for synchronizing periodic pulse signals of the opposite family (i.e. the second computing family), one for sending periodic pulse signals to the second computing core of the family, and one for detecting periodic pulse signals of the second computing core of the family. The CPU core particle 1, the calculation core particle 7 and the IO core particle 4 are respectively connected with the silicon adapter plate 8 through the micro-convex points 2, and the RDL 9 is responsible for interconnecting signal wires of the core particles. The silicon adapter plate 8 is connected with the substrate 11 through the second TSV through hole 52 and the copper bump 21, then the tube shell 3 is covered, and finally the signal wire is led out of the substrate 11 through the solder ball 10, thus completing the package.
In the invention, the main function of the calculation core particle is to encrypt and decrypt the data, and meanwhile, the monitoring and visual flow service of the data path are realized, and the safety of the chip for processing the data is improved. The computing core also integrates memory compression and processing functions for some special protocols inside, which can help the CPU core to greatly reduce the computational burden. The synchronization and detection of periodic pulse signals among calculation cores can realize the synchronization processing of data among the calculation cores and the distribution and scheduling of the performance of CPU cores, and the realization mode is that four calculation cores are interconnected together through a physical layer (PHY) to form a topological structure shown in figure 3. In this embodiment, a 5ms periodic pulse signal is taken as an example, and a similar periodic pulse signal, such as 5us, 10us, and the like, similar to the periodic pulse signal of fig. 6 may also be used. As shown in fig. 3, for the first computational system, the first computational core generates a 5ms periodic pulse signal and the second computational core is responsible for verifying and forwarding the 5ms periodic pulse signal to the first computational core; for the second computational system, the third computational core produces a 5ms periodic pulse signal and the fourth computational core is responsible for verifying and forwarding the 5ms periodic pulse signal to the first computational core. Taking the first computing system as an example, the first computing core particle is not only responsible for generating the 5ms periodic pulse signal of the system, but also detecting the 5ms periodic pulse signal forwarded by the second computing core particle of the system, and simultaneously synchronizing the 5ms periodic pulse signal input by the second computing system to ensure that the two systems are in the same period. In the periodic work, the 5ms periodic pulse signals are always mutually detected and synchronized, so that the purpose is to realize the switching of the chip among four working modes, and the dynamic selection among the performance mode, the safety mode and the power consumption mode is realized. In addition, once a 5ms period pulse signal fault occurs, the CPU core particle turns off the corresponding computing channel, and at the moment, the performance mode and the safety mode cannot be started and can only be switched to a low-power consumption mode and a single-system mode.
The chip of the invention has two stages in operation: an initialization phase and a periodic operation phase, as shown in fig. 4. After power-up, the chip interior is in an initialization phase, where the CPU die 1, the compute die 7, the IO die 4, and the memory die 6 all start to initialize. After the initialization is completed, the CPU core particle 1 sends configuration parameters to the four pieces of calculation core particles 7 through the UCie bus, and after the CPU core particle 1 receives the configuration-completed code of all the calculation core particles 7, the initialization of the whole chip is completed, and the chip starts to perform a periodic working stage. In the periodic work phase, four pieces of calculation cores 7 are divided into two systems, a first calculation system and a second calculation system, 5ms periodic pulse signals between the systems are mutually synchronized, and 5ms periodic pulse signals in the systems are mutually detected, as shown in fig. 3. Wherein the high level of the 5ms periodic pulse signal is 100us as shown in fig. 5. In normal periodic operation, the data signals of the IO core 4 and the four pieces of calculation core 7 do not need to be directly connected, but information interaction is completed through the CPU core 1, and the specific architecture is shown in fig. 6. Illustrating: the external original data is transmitted into the chip through the IO core particle 4, the IO core particle 4 transmits the original data to the CPU core particle 1 through the UCie bus, the CPU core particle 1 judges the working mode of the calculation core particle 7 at the moment, if the working mode is a high-performance mode, namely, four pieces of calculation core particles 7 work simultaneously, the CPU core particle 1 splits the data into four parts, the four parts are respectively transmitted to the four pieces of calculation core particles 7 through the UCie bus for calculation, each piece of calculation core particle 7 transmits the calculated data back to the CPU core particle 1 in the next period, the CPU core particle 1 transmits the calculated data to the IO core particle 4, and finally the IO core particle 4 transmits the calculated data to the outside of the chip. If any one of the first computing system and the second computing system detects that the 5ms periodic pulse signal has a fault in the process of periodic operation, the CPU core 1 closes the computing channel of the system according to the error reporting position, at this time, the chip can only use a single system mode and a low power consumption mode, the computing performance is greatly reduced, but the chip can still be used continuously. If the two-system 5ms periodic pulse signals are abnormal, the chip is seriously damaged and is not suitable for calculation any more, and the CPU core particle 1 reports errors.
The chip of the invention has four working modes in the period working process, namely a performance mode, a safety mode, a single system mode and a low power consumption mode:
the performance mode is that four pieces of calculation core particles work simultaneously and are used for calculating large-batch data, and the work is high in bandwidth and low in delay when the data are processed.
The safety mode has the advantage of ensuring the safety and reliability of data, and in the working mode, four pieces of calculation core particles are divided into two systems, and the two systems calculate the same data, which means that the data processing efficiency is only half of that of the performance mode. Then, the calculation results of the two systems are transmitted back to the CPU core particle 1 for comparison, and if the calculation results are the same, the CPU core particle 1 sends out the calculated results; if the calculation results are different, the CPU core 1 will judge which system is faulty according to the detection result of the 5ms periodic signal in the calculation core system, and then the CPU core 1 will send out the data of the non-faulty calculation system.
When the single-system mode is operated, only one computing system is operated, namely, the first computing system (or the second computing system) is in an operating state, and the second computing system (or the first computing system) is in a sleep state. The system in sleep state only keeps 5ms synchronization and detects this function, and the rest of the functions are all off. The purpose of reserving the 5ms synchronization and detection function is to facilitate the fast switching of the CPU core 1 to the performance mode and the safety mode. The single-system mode has the advantage of reducing part of power consumption while retaining certain computing performance.
In the low power consumption mode, only one of the four computing cores works, and the other three computing cores are in the sleep mode. The computational core in sleep mode retains only the 5ms pulse synchronization and detection functions, with the other functions all turned off. The purpose of reserving the 5ms synchronization and detection function is to facilitate the rapid switching of the CPU core 1 to the single system mode, the performance mode and the safety mode. The low power mode is convenient for processing some data with smaller calculation amount, and has extremely low power consumption.
The CPU core particle 1 is responsible for resource scheduling of a whole chip and processing data of other interfaces outside the chip, and is a core part of the whole chip, so that the technology production of the station power EUV 7nm with higher cost and advanced manufacturing process is adopted. The UCIe protocol is adopted as a communication protocol between the CPU core grain 1 and the four pieces of calculation core grains 7, and between the CPU core grain 1 and the IO core grain 4, as shown in fig. 6. UCIe Universal Chiplet Interconnect Express is a Die-to-Die interconnection standard jointly proposed by Intel's collage semiconductor companies, and is mainly aimed at unifying interconnection interface standards among core grains to create an open Chiplet ecosystem. UCIe has a plurality of advantages, supports PCIE6.0, CXL2.0 and CXL3.0, and also supports a Streaming protocol customized by a user to map other transmission protocols, and a protocol layer converts data into Flit packets for transmission. A user can realize a Die-to-Die interconnection interface with lower power consumption and better performance by replacing PHY and Link retransmission functions of PCIe/CXL with an adaptation layer and PHY of UCIe.
The calculation core particle is divided into two systems, when the chip is in a safe mode in the periodic work, the two calculation systems calculate the same data, then the data are sent to the inside of the CPU core particle 1 for comparison, and a two-out logic unit is arranged in the CPU core particle 1 and can judge which system of data is taken according to the two-part data result and the periodic pulse signal detection result, so that the safety and reliability of the data are ensured. Meanwhile, the periodic pulse signal can realize the synchronization and detection of four pieces of calculation core particles, if one piece of calculation core particles has faults, the calculation core particles can normally work in a single system mode and a low power consumption mode, and the chip can still continue to work, so that the problem of data in a calculation period caused by the chip crash is prevented.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (5)

1. The chip based on the Chiplet architecture is characterized by comprising a CPU core particle (1), a tube shell (3), an IO core particle (4), a first computing system, a second computing system, a silicon adapter plate (8) and a substrate (11); the first computing system and the second computing system have the same structure and respectively comprise two storage units and two computing cores (7); each storage unit is respectively arranged on one calculation core particle (7), the storage unit is connected with the calculation core particle (7) through a first TSV through hole (51), and the first TSV through hole (51) is connected with the RDL (9) through the micro-convex point (2); the CPU core particle (1), the calculation core particle (7) and the IO core particle (4) are respectively connected with the silicon adapter plate (8) through the micro-convex points (2); the silicon adapter plate (8) is connected with the substrate (11) through a second TSV through hole (52) and a copper bump (21), and a solder ball (10) is arranged at the bottom of the substrate (11); the CPU core particle (1), the IO core particle (4), the storage unit, the calculation core particle (7), the silicon adapter plate (8) and the substrate (11) are all arranged in the tube shell (3);
the CPU core particle (1) performs data interaction with the IO core particle (4), the first computing system and the second computing system through the UCie bus respectively; the first computing system comprises two storage units, a first computing core particle and a second computing core particle, wherein the first computing core particle generates a periodic pulse signal and detects the periodic pulse signal forwarded by the second computing core particle, and the second computing core particle is responsible for checking and forwarding the periodic pulse signal to the first computing core particle; at the same time, the first calculation core particle synchronizes the periodic pulse signal input by the second calculation system.
2. Chip based on the Chiplet architecture according to claim 1, characterized in that each memory unit comprises four identical memory dies (6), the four memory dies (6) are stacked on one computing die (7) in turn, each memory die (6) is connected to the computing die (7) through a first TSV via (51), respectively.
3. A chip control method based on a Chiplet architecture, characterized in that the chip according to any one of claims 1-2, when in operation, is: an initialization phase and a periodic working phase; after power-on, the chip is in an initialization stage, and the CPU core particle (1), the calculation core particle (7), the IO core particle (4) and the storage core particle (6) are initialized;
after initialization is completed, the CPU core particle (1) sends configuration parameters to each calculation core particle (7) through the UCie bus, after the CPU core particle (1) receives the back codes of all calculation core particle (7) configuration completion, the initialization of the whole chip is completed, and then the chip starts to perform periodic working stages.
4. The chip control method based on the Chiplet architecture according to claim 3, wherein in the periodic working phase, the periodic pulse signals between the first computing system and the second computing system are synchronized with each other, and the periodic pulse signals in each computing system are detected with each other;
if a certain computing system detects that the periodic pulse signal is abnormal, the CPU core particle (1) closes a computing channel of the system according to the error reporting position;
if the periodic pulse signals of both calculation systems are abnormal, the CPU core (1) will report errors.
5. The chip control method based on the Chiplet architecture according to claim 3, wherein during the periodic operation, there are four operation modes: performance mode, security mode, single-system mode, and low power mode;
performance mode: the four pieces of calculation core particles are all in a working state and are used for calculating a large amount of data;
safety mode: when the method works, four pieces of calculation core particles are divided into two calculation systems, and the two calculation systems respectively calculate the same data; then, the calculation results of the two calculation systems are transmitted back to the CPU core particle (1) and are compared, and if the calculation results are the same, the CPU core particle (1) sends out the calculated results; if the calculation results are different, the CPU core particle (1) judges the abnormal calculation system according to the detection result of the calculation core particle on the periodic signal, and then sends out the data of the normal working calculation system;
single-line mode: only one computing system works, and the other computing system is in a sleep state; the computing system in the sleep state only keeps the synchronization and detection functions, and the rest functions are all closed;
low power consumption mode: only one computing core particle will work, and the rest computing core particles are in a sleep mode; the calculation core particle in the sleep mode only keeps the pulse signal synchronization and detection functions, and other functions are all closed.
CN202211183717.5A 2022-09-27 2022-09-27 Chip based on Chiplet architecture and control method Active CN115617739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211183717.5A CN115617739B (en) 2022-09-27 2022-09-27 Chip based on Chiplet architecture and control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211183717.5A CN115617739B (en) 2022-09-27 2022-09-27 Chip based on Chiplet architecture and control method

Publications (2)

Publication Number Publication Date
CN115617739A CN115617739A (en) 2023-01-17
CN115617739B true CN115617739B (en) 2024-02-23

Family

ID=84860980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211183717.5A Active CN115617739B (en) 2022-09-27 2022-09-27 Chip based on Chiplet architecture and control method

Country Status (1)

Country Link
CN (1) CN115617739B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116414212B (en) * 2023-04-13 2024-02-13 海光信息技术股份有限公司 Core particle and control method for core particle
CN116256620B (en) * 2023-05-15 2023-07-14 中诚华隆计算机技术有限公司 Chiplet integrated chip detection method and device, electronic equipment and storage medium
CN116302899B (en) * 2023-05-18 2023-07-28 中诚华隆计算机技术有限公司 Core particle fault diagnosis method and device
CN116743317B (en) * 2023-06-29 2024-01-23 上海奎芯集成电路设计有限公司 Data transmission method based on universal chip interconnection standard
CN116992820B (en) * 2023-09-27 2024-01-09 之江实验室 Scalable intelligent computing chip structure based on core particle integration
CN117377327A (en) * 2023-12-05 2024-01-09 荣耀终端有限公司 Packaging structure, packaging chip and electronic equipment
CN117610469A (en) * 2024-01-23 2024-02-27 芯来智融半导体科技(上海)有限公司 Core particle and topological structure based on core particle

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010076790A (en) * 2000-01-28 2001-08-16 오길록 I/O-based high availability through middleware in the COTS RTOS
RU2444053C1 (en) * 2010-08-05 2012-02-27 Федеральное государственное унитарное предприятие "Научно-производственное объединение автоматики имени академика Н.А. Семихатова" Computer system
WO2018121118A1 (en) * 2016-12-26 2018-07-05 上海寒武纪信息科技有限公司 Calculating apparatus and method
CN109558370A (en) * 2017-09-23 2019-04-02 成都海存艾匹科技有限公司 Three-dimensional computations encapsulation
CN112149369A (en) * 2020-09-21 2020-12-29 交叉信息核心技术研究院(西安)有限公司 Multi-core packaging level system based on core grain framework and core grain-oriented task mapping method thereof
CN112562767A (en) * 2020-12-29 2021-03-26 国家数字交换系统工程技术研究中心 On-chip software definition interconnection network device and method
CN112582390A (en) * 2019-09-27 2021-03-30 英特尔公司 Packaged device with chiplets including memory resources
CN112613264A (en) * 2020-12-25 2021-04-06 南京蓝洋智能科技有限公司 Distributed extensible small chip design framework
CN113986817A (en) * 2021-12-30 2022-01-28 中科声龙科技发展(北京)有限公司 Method for accessing in-chip memory area by operation chip and operation chip
CN114721993A (en) * 2022-04-08 2022-07-08 北京灵汐科技有限公司 Many-core processing device, data processing method, data processing equipment and medium
CN114823592A (en) * 2022-06-30 2022-07-29 之江实验室 On-wafer system structure and preparation method thereof
CN114899185A (en) * 2022-07-12 2022-08-12 之江实验室 Integrated structure and integrated method suitable for wafer-level heterogeneous core particles

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10755201B2 (en) * 2018-02-14 2020-08-25 Lucid Circuit, Inc. Systems and methods for data collection and analysis at the edge
US10742217B2 (en) * 2018-04-12 2020-08-11 Apple Inc. Systems and methods for implementing a scalable system
KR20200025200A (en) * 2018-08-29 2020-03-10 삼성전자주식회사 Electronic devices and methods of operating electronic devices

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010076790A (en) * 2000-01-28 2001-08-16 오길록 I/O-based high availability through middleware in the COTS RTOS
RU2444053C1 (en) * 2010-08-05 2012-02-27 Федеральное государственное унитарное предприятие "Научно-производственное объединение автоматики имени академика Н.А. Семихатова" Computer system
WO2018121118A1 (en) * 2016-12-26 2018-07-05 上海寒武纪信息科技有限公司 Calculating apparatus and method
CN109558370A (en) * 2017-09-23 2019-04-02 成都海存艾匹科技有限公司 Three-dimensional computations encapsulation
CN112582390A (en) * 2019-09-27 2021-03-30 英特尔公司 Packaged device with chiplets including memory resources
CN112149369A (en) * 2020-09-21 2020-12-29 交叉信息核心技术研究院(西安)有限公司 Multi-core packaging level system based on core grain framework and core grain-oriented task mapping method thereof
CN112613264A (en) * 2020-12-25 2021-04-06 南京蓝洋智能科技有限公司 Distributed extensible small chip design framework
CN112562767A (en) * 2020-12-29 2021-03-26 国家数字交换系统工程技术研究中心 On-chip software definition interconnection network device and method
CN113986817A (en) * 2021-12-30 2022-01-28 中科声龙科技发展(北京)有限公司 Method for accessing in-chip memory area by operation chip and operation chip
CN114721993A (en) * 2022-04-08 2022-07-08 北京灵汐科技有限公司 Many-core processing device, data processing method, data processing equipment and medium
CN114823592A (en) * 2022-06-30 2022-07-29 之江实验室 On-wafer system structure and preparation method thereof
CN114899185A (en) * 2022-07-12 2022-08-12 之江实验室 Integrated structure and integrated method suitable for wafer-level heterogeneous core particles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Chiplet-based System PSI Optimization for 2.5D/3D Advanced Packaging Implementation;Yoonjae Hwang;《2022 IEEE 72nd Electronic Components and Technology Conference (ECTC)》;1-6 *
IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management;Pascal Vivet;《IEEE Journal of Solid-State Circuits ( Volume: 56, Issue: 1, January 2021)》;79-97 *

Also Published As

Publication number Publication date
CN115617739A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN115617739B (en) Chip based on Chiplet architecture and control method
CN102449614B (en) Packetized interface for coupling agents
CN107431061B (en) Method and circuit for communication in multi-die package
US9979432B2 (en) Programmable distributed data processing in a serial link
US10423567B2 (en) Unidirectional clock signaling in a high-speed serial link
US10410694B1 (en) High bandwidth chip-to-chip interface using HBM physical interface
US10282341B2 (en) Method, apparatus and system for configuring a protocol stack of an integrated circuit chip
CN113986797A (en) Interface bridge between integrated circuit dies
Farjadrad et al. A bunch-of-wires (BoW) interface for interchiplet communication
KR101679333B1 (en) Method, apparatus and system for single-ended communication of transaction layer packets
CN112860612A (en) Interface system for interconnecting bare core and MPU and communication method thereof
CN115050727B (en) Wafer processor and circuit self-test and power supply management device used for same
TWI312076B (en) Apparatus and related method for chip i/o test
Taylor et al. High capacity on-package physical link considerations
WO2023129304A1 (en) Lane repair and lane reversal implementation for die-to-die (d2d) interconnects
CN115589226A (en) Three-dimensional programmable logic circuit system and method
Liao et al. Exploring AMBA AXI on-chip interconnection for TSV-based 3D SoCs
Vinnakota The open domain-specific architecture: Next steps to production
TWI804972B (en) Communication system between dies and operation method thereof
Liao et al. A Scalable Die-to-Die Interconnect with Replay and Repair Schemes for 2.5 D/3D Integration
CN112835847A (en) Distributed interrupt transmission method and system for interconnected bare core

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant