CN113051111A - Multi-chip module fault identification processing method and system - Google Patents

Multi-chip module fault identification processing method and system Download PDF

Info

Publication number
CN113051111A
CN113051111A CN202110249196.8A CN202110249196A CN113051111A CN 113051111 A CN113051111 A CN 113051111A CN 202110249196 A CN202110249196 A CN 202110249196A CN 113051111 A CN113051111 A CN 113051111A
Authority
CN
China
Prior art keywords
single chip
chip
mcm
primary
data link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110249196.8A
Other languages
Chinese (zh)
Other versions
CN113051111B (en
Inventor
黄炜
钟雨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202110249196.8A priority Critical patent/CN113051111B/en
Publication of CN113051111A publication Critical patent/CN113051111A/en
Application granted granted Critical
Publication of CN113051111B publication Critical patent/CN113051111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2289Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by configuration test
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing

Abstract

The invention provides a multi-chip module fault identification processing method and a system, wherein the multi-chip module fault identification processing method comprises the following steps: starting an MCM multichip, the MCM multichip comprising at least one single chip; screening a primary single chip with a stably connected control link from the at least one single chip by checking connectivity of the control link in the MCM multi-chip; replanning and configuring a control link and a data link of the primary single chip; checking a data link of the primary single chip; and taking a primary single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip. The invention can reduce the manufacturing cost of the MCM multichip and improve the reliability of the MCM multichip.

Description

Multi-chip module fault identification processing method and system
Technical Field
The invention relates to the technical field of MCM (Multi-chip Module), in particular to a method and a system for recognizing and processing faults of a Multi-chip Module.
Background
In the current high-performance computing chip architecture, a chip architecture mode is widely used in consideration of factors such as cost and expansibility. That is, in the development stage, a complete SOC (System on Chip) design is performed in units of Die (bare Chip), and an extensible interface is designed at the same time. In the packaging stage, according to the requirements of different chip product lines, a plurality of Dies are packaged into a Package, and then are linked and expanded through a high-speed bus to form chips with different performances.
In an actual chip manufacturing process, single chips are produced in a streamlined manner by using a wafer as a unit, and each single chip is Die in subsequent packaging. Some Dies in the same MCM multi-chip are qualified, some Dies are unqualified, and some Dies are faulty. Therefore, before the packaging factory packages, unqualified chips can be removed, and qualified chips are screened out to carry out subsequent packaging, so that the yield of a single chip is improved, and the overall yield of the MCM multi-chip packaged chip is improved.
However, after the MCM multi-chip package is packaged, one or more Die failures may occur therein due to inter-Die connection failures or due to collision during transportation, and the like, and one or more Die may be damaged due to long-term loss during use.
Therefore, only the packaged MCM multichip can be discarded integrally after the fault is found, so that the manufacturing cost of the MCM multichip is indirectly improved, and meanwhile, the reliability of the MCM multichip is integrally reduced.
Disclosure of Invention
In order to solve the problems, the multi-chip module fault identification processing method and the multi-chip module fault identification processing system provided by the invention can be used for dynamically detecting the fault chip in the MCM multi-chip from the angles of the control link and the data link, so that the utilization rate of the MCM multi-chip can be effectively improved, the manufacturing cost of the MCM multi-chip is reduced, and the reliability of the MCM multi-chip is improved.
In a first aspect, the present invention provides a method for identifying and processing a multi-chip module fault, including:
starting an MCM multichip, the MCM multichip comprising at least one single chip;
screening a primary single chip with a stably connected control link from the at least one single chip by checking connectivity of the control link in the MCM multi-chip;
replanning and configuring a control link and a data link of the primary single chip;
checking a data link of the primary single chip;
and taking a primary single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
Optionally, before the replanning and configuring the control link and the data link of the primary single chip, the method further includes:
repeatedly executing the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip, and recording the number of the screened primary single chips each time;
selecting the first-grade single chip screened out at the time with the least screening quantity as a first-grade excellent single chip;
the replanning and configuring the control link and the data link of the primary single chip comprises:
replanning and configuring a control link and a data link of the first-level excellent single chip;
the step of checking the data link of the primary single chip comprises:
checking a data link of the primary good single chip;
the primary single chip which is used for stably connecting the data link is used as a secondary single chip, and the functions of the MCM multi-chip are realized through the secondary single chip, wherein the functions comprise:
and taking a primary excellent single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
Optionally, the method further comprises:
and carrying out shielding operation on the single chip of the at least one single chip, wherein the single chip is used for controlling unstable connection of the link.
Optionally, the method further comprises:
and performing low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip.
Optionally, before the checking the data link of the primary single chip, the method further comprises:
and checking the control link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, ending the multi-chip module fault identification processing method.
Optionally, the method further comprises:
carrying out shielding operation on a first-grade good single chip which is connected with a data link unstably;
the function of realizing the MCM multichip through the secondary single chip comprises the following steps:
replanning and configuring a control link and a data link of the secondary single chip according to the number and the index of the secondary single chip;
and checking whether the data link of the secondary single chip is stably connected or not, if not, ending the multi-chip module fault identification processing method, and if so, adjusting the working frequency of the MCM multi-chip according to the number of the secondary single chips and the working state of the MCM multi-chip so that the secondary single chip can realize the function of the MCM multi-chip.
Optionally, the method further comprises:
and carrying out low-power consumption processing on the primary excellent single chip which is unstably connected with the data link.
In a second aspect, the present invention provides a multi-chip module fault identification processing system, including:
a boot module configured to boot an MCM multichip, the MCM multichip comprising at least one single chip;
a screening module configured to screen a primary single chip with a control link stably connected from the at least one single chip by checking connectivity of the control link in the MCM multichip;
an adjustment module configured to re-plan and configure a control link and a data link of the primary single chip;
a first inspection module configured to inspect a data link of the primary single chip;
and the processing module is configured to take the primary single chip with the stably connected data link as a secondary single chip and realize the functions of the MCM multi-chip through the secondary single chip.
Optionally, the system further comprises:
a repeated execution module configured to repeatedly execute the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip before replanning and configuring the control link and the data link of the primary single chip, and record the number of the primary single chips screened each time;
the selection module is configured to select the first-level single chip screened out at the time with the least screening quantity as a first-level excellent single chip;
the adjusting module is further configured to replan and configure the control link and the data link of the primary good single chip;
the first checking module is further configured to check a data link of the primary good single chip;
the processing module is further configured to use a primary excellent single chip with a stably connected data link as a secondary single chip, and to implement the function of the MCM multi-chip through the secondary single chip.
Optionally, the system further comprises:
a first shielding module configured to shield a single chip of the at least one single chip, which controls unstable link connection.
Optionally, the system further comprises:
and the first low-power consumption processing module is configured to perform low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip.
Optionally, the system further comprises:
and the second checking module is configured to check the control link of the first-stage good single chip before checking the data link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, the multi-chip module fault identification processing system is ended.
Optionally, the system further comprises:
the second shielding module is configured to shield a first-class good single chip with unstable data link connection;
the processing module comprises:
the adjusting submodule is configured to replan and configure a control link and a data link of the secondary single chip according to the number and the index of the secondary single chip;
and the checking submodule is configured to check whether the data link of the secondary single chip is stably connected or not, if not, the multi-chip module fault identification processing system is ended, and if yes, the working frequency of the MCM multi-chip is adjusted according to the number of the secondary single chips and the working state of the MCM multi-chip so that the secondary single chip can realize the functions of the MCM multi-chip.
Optionally, the system further comprises:
and the second low-power consumption processing module is configured to perform low-power consumption processing on the first-class excellent single chip with the unstable connection of the data link.
According to the multi-chip module fault identification processing method and system provided by the embodiment of the invention, the fault chips in the MCM multi-chip are dynamically detected from the angles of the control link and the data link, so that the utilization rate of the MCM multi-chip can be effectively improved, the manufacturing cost of the MCM multi-chip is reduced, and the reliability of the MCM multi-chip is improved.
Drawings
FIG. 1 is a schematic flow chart of a multi-chip module fault identification processing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic block diagram of an MCM multichip of an embodiment of the application;
fig. 3 is a schematic structural diagram of a control link between Die according to an embodiment of the present application;
fig. 4 is a schematic structural diagram showing a connection relationship between control links between dies in scenario one according to an embodiment of the present application;
fig. 5 is a schematic structural diagram showing a connection relationship between Die control links in a scenario two according to an embodiment of the present application;
fig. 6 is a schematic structural diagram showing a connection relationship between Die control links in scene three according to an embodiment of the present application;
fig. 7 is a schematic block diagram of a multi-chip module fault identification processing system according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, the present invention explains the terms involved, specifically as follows:
and Die: a single chip before packaging, which contains a complete design, is referred to as a "single chip" in the invention;
package: the name of the packaged chip is called as a package in the invention;
SOC: system on Chip, which refers to a Chip of a complete integrated circuit System with a dedicated target, is referred to as a "System-on-Chip" in the present invention;
MCM: a Multi-chip Module, a Module packaged by multiple Die, is called a "Multi-chip Module" in the invention, wherein, the MCM Multi-chip is a chip completed by using MCM technology.
In a first aspect, this embodiment provides a method for identifying and processing a multi-chip module fault, which, with reference to fig. 1, includes steps S101 to S105:
step S101: starting up an MCM multichip, the MCM multichip comprising at least one single chip.
In this embodiment, the specific manner for starting the MCM multichip includes: cold start of MCM multichip, and restart in case of malfunction of MCM multichip in operation. This enables the method to perform fault identification processing on MCM multichip after transportation and after a period of use.
Step S102: and screening a primary single chip with a stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip.
Step S103: and replanning and configuring the control link and the data link of the primary single chip.
Step S104: and checking the data link of the primary single chip.
Step S105: and taking a primary single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
In an optional embodiment, before the step S103, the method further includes: repeatedly executing the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip, and recording the number of the screened primary single chips and corresponding indexes; and selecting the first-grade single chip screened at the time with the least screening quantity as a first-grade excellent single chip according to the index. Therefore, the control link of the primary single chip can be ensured to have stable connectivity.
Further, in the case of determining a good single chip, the step S103 includes: replanning and configuring a control link and a data link of the first-level excellent single chip; the step S104 includes: checking a data link of the primary good single chip; the step S105 includes: and taking a primary excellent single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
In an optional embodiment, the method further comprises: and carrying out shielding operation on the single chip of the at least one single chip, wherein the single chip is used for controlling unstable connection of the link. By shielding the single chip with the unstably connected control links, the core single chip in the MCM multi-chip can be ensured to stably control part of single chips with other stably connected control links.
In an optional embodiment, the method further comprises: and performing low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip. The single chip connected with the control link unstably is subjected to low-power consumption processing, so that the power consumption ratio occupied by the single chip not participating or shielded in the MCM multi-chip can be effectively reduced, and the single chip connected with the core single chip and maintaining the stable control link can efficiently complete corresponding working contents.
In an alternative embodiment, before said checking the data link of the primary single chip, the method further comprises: and checking the control link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, ending the multi-chip module fault identification processing method.
Specifically, firstly, restarting the chip, and checking the control link between the first-level excellent single chips again; then, acquiring the number and index of the first-stage excellent single chips; and finally, checking whether the screened first-stage good single chip can be stably connected through the control link, and if the first-stage good single chip fails, exiting the process of the method.
The connectivity of the control link of the one-level good single chip after planning is checked, the control link of the one-level good single chip can be further prevented from being unstable when the control link is stably connected between the one-level good single chips, and accordingly the MCM multi-chip is convenient to discard as soon as possible.
In an optional embodiment, the method further comprises: and carrying out shielding operation on the first-grade good single chip with unstable connection of the data link.
Thus, the implementation of the function of the MCM multichip by the secondary single chip in the method includes: replanning and configuring a control link and a data link of the secondary single chip according to the number and the index of the secondary single chip; and checking whether the data link of the secondary single chip is stably connected or not, if not, ending the multi-chip module fault identification processing method, and if so, adjusting the working frequency of the MCM multi-chip according to the number of the secondary single chips and the working state of the MCM multi-chip so that the secondary single chip can realize the function of the MCM multi-chip.
Specifically, first, under the condition that a control link between first-stage good single chips is stable and after planning and configuring the control link and a data link between the first-stage good single chips, whether the data link between the first-stage good single chips is normal is checked; if the first-stage good single chip in the data link cannot be normally connected, judging that the first-stage good single chip is a fault single chip with abnormal data link connection; then, after the fault single chip is subjected to shielding operation, according to the number and the index of the shielded and remaining first-stage excellent single chips, namely the second-stage single chips, the control link and the data link between the remaining first-stage excellent single chips are re-planned again; then, according to the number of the secondary single chips which can be stably connected through the control link and the data link and by integrating the system input voltage and the load condition, the highest working frequency of the MCM multi-chip is improved so as to compensate the performance loss caused by shielding a fault single chip; after the MCM multi-chip is subjected to subsequent initialization operation, the operating system is finally started; the operating system operates normally using resources on a primary good single chip that is stably connected through a control link and a data link. So can shield the single-chip of data link unstable connection, can guarantee the use of the single-chip of other data links and the equal stable connection of control link again to can further improve MCM multichip's rate of utilization.
In an optional embodiment, the method further comprises: and carrying out low-power consumption processing on the primary excellent single chip which is unstably connected with the data link. The low power consumption processing is carried out on the first-stage excellent single chip which is connected with the data link unstably, so that the power consumption ratio occupied by the single chip which does not participate or is shielded in the MCM multi-chip can be effectively reduced, and the single chip which is connected with the core single chip and keeps the stable control link can efficiently complete corresponding work content.
According to the multi-chip module fault identification processing method, the fault single chip is dynamically detected, after the fault single chip is shielded, the connection between the data link and the control link is reestablished according to the condition of the residual single chip, and finally the purpose of shielding the fault single chip and normally operating the residual single chip in the MCM multi-chip is achieved, so that the use value of the MCM multi-chip is excavated to the utmost extent. Particularly, for MCM multichip with faults caused by loss in long-term operation, the strategy is executed by restarting the MCM multichip, and then the use of the system can be recovered, namely the reliability and the usability of the MCM multichip are improved.
In addition, the information of the fault single chip acquired in the dynamic detection process can be used for feeding back to a packaging department for technical improvement and fault positioning analysis.
In a second aspect, the present embodiment provides a method for recognizing and processing a multi-chip module failure. In this embodiment, in conjunction with fig. 2, the MCM multichip contains four Die, Die0, Die1, Die2, and Die 3.
The method comprises the following steps:
step 1: an MCM multichip which fails at the time of starting or an MCM multichip which fails during operation is mounted on a substrate, and a restart operation is performed on the MCM multichip.
During the restarting process, the operating system can acquire the number of Dies in the MCM multichip and the connection relation between the control link and the data link among the Dies through indexes. In this example, each time the control link or data link check between the Die and the corresponding planning and configuration can be implemented in a restart manner, which is not described in detail in this embodiment.
Step 2: and checking whether the control link between the Dies in the MCM multichip is normally connected.
If all the control links among the Dies in the MCM multichip are normally connected, the control link detection is judged to be passed, and the step of checking the data link of the MCM multichip is entered, namely the step 3.
If the control link among the Dies in the MCM multi-chip is abnormal, firstly, checking the control link among the four Dies, determining the Die which can be normally connected with the final control link from the four Dies, replanning and configuring the control link and the data link of the Die which can be normally connected with the final determined control link, and shielding other Dies and carrying out low-power consumption processing; then, the data link of Die to which the finally determined control link can be normally connected is checked, that is, a step of checking the data link of the re-planned MCM multichip is entered, that is, step 3.
Further, with reference to fig. 3, the control links between the four Die use a U-shaped connection manner. In the control link checking stage, Die0 is a core single chip, and the instance checks the connectivity of Die0 and other dice step by step according to the increasing direction of Die index; according to the detection in the mode, the fault chip identification does not have the following three scenes:
scene one: referring to fig. 4, only 1 Die can be identified in the control path, i.e., the control path connectivity check between Die0 and Die1 fails, and the check is exited. Thus, the system records the number of Die that the control link is stably connected to as one, i.e., Die 0.
Scene two: with reference to fig. 5, 2 Die can be identified in the control path, that is, Die0 and Die1 are successful in connectivity check, and the connectivity check between Die1 and Die2 is failed, and the check is exited. Thus, the system records that the number of Die connected stably by the control link is two, i.e., Die0 and Die 1.
Scene three: with reference to fig. 6, 3 Die can be identified in the control path, that is, Die0 and Die1 are successfully checked for connectivity, Die1 and Die2 are successfully checked for connectivity, Die2 and Die3 are failed for connectivity check, and the check is exited. Thus, the system records that the number of Die to which the control link is stably connected is three, i.e., Die0, Die1, and Die 2.
In this example, the number of repetitions of the step of screening out a primary single-chip to which a control link is stably connected from the at least one single-chip by checking connectivity of the control link in the MCM multi-chip is repeatedly performed based on the first aspect is not limited.
However, according to the recorded result, if the three scenes are included, determining that the number of the first-level good single chips is one, namely Die 0; if the scene two and the scene three in the three scenes are included, determining that the number of the first-stage excellent single chips is two, namely Die0 and Die 1; if only scene three of the above three scenes is included, the number of the first-level good single chips is determined to be three, namely Die0, Die1 and Die 2. For other cases, this embodiment will not be described in detail.
Taking the number of superior single chips as two, the MCM multi-chip will perform masking operations and low power consumption processing on Die2 and Die3 during the initialization phase, i.e., before the MCM multi-chip loads the operating system.
Wherein, the MCM multicore chip will perform shielding operation on Die2 and Die3 to make them enter into blocking state. Specifically, the MCM multichip eliminates Die2 and Die3 from inter-Die control path connections through initialization operations, so that subsequent access control does not reach Die2 and Die3 any more, and subsequent multi-Die synchronization is performed by modifying the number of Die that can be expected to be connected across the MCM multichip, when there is no Die2 and Die 3.
The MCM multicore chip may perform low power processing on Die2 and Die 3. Taking an MCM multichip as a product under an arm (Advanced RISC Machine) architecture as an example, the MCM multichip may use a wfi instruction in an arm instruction set to bring core (cores) of Die2 and Die3 into a static state in an initialization stage, thereby stopping an initialization flow of Die2 and Die3, allowing Die2 and Die3 to have only a small amount of static power consumption, and reducing dynamic power consumption expenses of the MCM multichip.
Before the MCM multi-chip performs the masking operation and the low power consumption processing on Die2 and Die3, the embodiment further includes: reading the recorded number of the Dies normally connected with the control link, and acquiring the number of the Dies normally connected with the control link in the MCM multichip; then, the control links, or control links and data links, of Die0 and Die1 are re-planned and configured, and the path connectivity between the dies to which the control links are normally connected is again checked. If the check fails, directly quitting the processing flow; if the check is normal, step 3 is entered. Wherein, if the number of Dies which are normally connected with the control link in the MCM multi-chip is 1, the operating system skips the step.
And step 3: checking the data link of the MCM multichip or checking the data link of the MCM multichip after being re-planned.
In this embodiment, the data links of the MCM multichip are all-phase connections. Specifically, the reprogrammed MCM multichip includes: die0, Die1, and Die2, i.e., Die3, were rejected because the control link connection was not normal. At this time, step 3 includes: and checking the data link of the MCM multichip after the replanned check, and discarding the MCM multichip if the check result shows that only the data links of Die0 and Die1 are normal or only the data links of Die0 and Die2 are normal. Alternatively, step 3 comprises: and checking the data link of the MCM multichip after the re-planning check for many times, and if the check results are inconsistent, namely at least two different check results exist, discarding the MCM multichip. The step can be used for finding that the replanned and configured single chip has problems in the data link after the single chip with the fault of the control link in the MCM multi-chip is shielded, and directly abandoning the MCM multi-chip, so that the efficiency of multi-chip module fault identification processing can be improved, and the phenomenon that the MCM multi-chip has a dead cycle in the fault identification processing process is avoided.
If all control links among the Dies in the MCM multichip are normally connected, the step of entering the check of the data link of the MCM multichip comprises the following steps: and checking the data link of the MCM multiple chips for multiple times, and discarding the MCM multiple chips if the checking results are inconsistent, namely at least two different checking results exist. This step can be to the normal condition of control link in the MCM multichip, and the single-chip is at data link problem, and can't be quick to the single-chip of problem affirming, then directly abandons this MCM multichip, so can improve the efficiency that the multicore piece module trouble discerned and handles, avoids the MCM multichip to appear the endless loop at the in-process of trouble discernment processing.
If all control links among the Dies in the MCM multichip are normally connected, the step of entering the check of the data link of the MCM multichip comprises the following steps: checking a data link of the MCM multi-chip, and if the checking result is that only the data links of the Die0 and the Die1 are normal, firstly, carrying out shielding operation and related low-power consumption processing on the Die2 and the Die 3; then, the control links and data links of Die0 and Die1 are re-planned and configured. Then, the control link and data link between Die0 and Die1 are checked again, and if the control link and/or data link between Die0 and Die1 are abnormal, the MCM multichip is discarded. Therefore, when a data link between Dies in the MCM multi-chip has a problem, the single chip with the fault in the data link is removed, and the rest single chips are planned and configured again, so that the service efficiency of the MCM multi-chip can be improved.
In an optional embodiment, the control link and the data link between the dice in the MCM multichip are all connected, so that when a problem occurs in the control link or/and the data link of any Die in the MCM multichip, it is only necessary to perform shielding operation and low power consumption processing on the control link or/and the data link, and plan and configure the control links and the data links between other dice.
Step four: the number of Dies with normal control links and data links is integrated with the input voltage and load condition of the system, and the highest working frequency of the MCM multichip is improved to compensate the performance loss caused by shielding a fault single chip; after the MCM multi-chip is subjected to subsequent initialization operation, the operating system is finally started; the operating system operates normally using resources on a secondary single chip that are stably connected through a control link and a data link.
In a third aspect, the present embodiment provides a multi-chip module fault recognition processing system 200, and with reference to fig. 7, the multi-chip module fault recognition processing system 200 includes:
a start-up module 201 configured to start up an MCM multichip, the MCM multichip including at least one single chip;
a screening module 202 configured to screen a primary single chip with a control link stably connected from the at least one single chip by checking connectivity of the control link in the MCM multichip;
a tuning module 203 configured to re-plan and configure the control link and the data link of the primary single chip;
a first checking module 204 configured to check a data link of the primary single chip;
and the processing module 205 is configured to take the primary single chip with the stably connected data link as a secondary single chip, and implement the function of the MCM multichip through the secondary single chip.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises:
a repeated execution module configured to repeatedly execute the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip before replanning and configuring the control link and the data link of the primary single chip, and record the number of the primary single chips screened each time;
the selection module is configured to select the first-level single chip screened out at the time with the least screening quantity as a first-level excellent single chip;
the adjusting module 203 is further configured to re-plan and configure the control link and the data link of the primary good single chip;
the first checking module 204 is further configured to check a data link of the primary good single chip;
the processing module 205 is further configured to use the primary good single chip with the stably connected data link as a secondary single chip, and implement the function of the MCM multi-chip through the secondary single chip.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises: a first shielding module configured to shield a single chip of the at least one single chip, which controls unstable link connection.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises: and the first low-power consumption processing module is configured to perform low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises: and the second checking module is configured to check the control link of the first-stage good single chip before checking the data link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, the multi-chip module fault identification processing system is ended.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises: the second shielding module is configured to shield a first-class good single chip with unstable data link connection;
the processing module 205 includes: the adjusting submodule is configured to replan and configure a control link and a data link of the secondary single chip according to the number and the index of the secondary single chip; and the checking submodule is configured to check whether the data link of the secondary single chip is stably connected or not, if not, the multi-chip module fault identification processing system is ended, and if yes, the working frequency of the MCM multi-chip is adjusted according to the number of the secondary single chips and the working state of the MCM multi-chip so that the secondary single chip can realize the functions of the MCM multi-chip.
In an alternative embodiment, the multi-chip module fault identification processing system 200 further comprises: and the second low-power consumption processing module is configured to perform low-power consumption processing on the first-class excellent single chip with the unstable connection of the data link.
According to the multi-chip module fault identification processing method and system provided by the embodiment of the invention, the fault chips in the MCM multi-chip are dynamically detected from the angles of the control link and the data link, so that the utilization rate of the MCM multi-chip can be effectively improved, the manufacturing cost of the MCM multi-chip is reduced, and the reliability of the MCM multi-chip is improved.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A multi-chip module fault identification processing method is characterized by comprising the following steps:
starting an MCM multichip, the MCM multichip comprising at least one single chip;
screening a primary single chip with a stably connected control link from the at least one single chip by checking connectivity of the control link in the MCM multi-chip;
replanning and configuring a control link and a data link of the primary single chip;
checking a data link of the primary single chip;
and taking a primary single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
2. The multi-chip module fault identification processing method of claim 1, wherein prior to the replanning and configuring the control link and data link of the primary single chip, the method further comprises:
repeatedly executing the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip, and recording the number of the screened primary single chips each time;
selecting the first-grade single chip screened out at the time with the least screening quantity as a first-grade excellent single chip;
the replanning and configuring the control link and the data link of the primary single chip comprises: replanning and configuring a control link and a data link of the first-level excellent single chip;
the step of checking the data link of the primary single chip comprises: checking a data link of the primary good single chip;
the primary single chip which is used for stably connecting the data link is used as a secondary single chip, and the functions of the MCM multi-chip are realized through the secondary single chip, wherein the functions comprise: and taking a primary excellent single chip with a stably connected data link as a secondary single chip, and realizing the function of the MCM multi-chip through the secondary single chip.
3. The multi-chip module fault identification processing method of claim 1, further comprising:
and carrying out shielding operation on the single chip of the at least one single chip, wherein the single chip is used for controlling unstable connection of the link.
4. The multi-chip module fault identification processing method of claim 1, further comprising:
and performing low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip.
5. The multi-chip module fault identification processing method of claim 2, wherein prior to the checking of the data link of the primary single chip, the method further comprises:
and checking the control link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, ending the multi-chip module fault identification processing method.
6. A multi-chip module fault identification processing system, comprising:
a boot module configured to boot an MCM multichip, the MCM multichip comprising at least one single chip;
a screening module configured to screen a primary single chip with a control link stably connected from the at least one single chip by checking connectivity of the control link in the MCM multichip;
an adjustment module configured to re-plan and configure a control link and a data link of the primary single chip;
a first inspection module configured to inspect a data link of the primary single chip;
and the processing module is configured to take the primary single chip with the stably connected data link as a secondary single chip and realize the functions of the MCM multi-chip through the secondary single chip.
7. The multi-chip module fault identification handling system of claim 6, further comprising:
a repeated execution module configured to repeatedly execute the step of screening the primary single chip with the stably connected control link from the at least one single chip by checking the connectivity of the control link in the MCM multi-chip before replanning and configuring the control link and the data link of the primary single chip, and record the number of the primary single chips screened each time;
the selection module is configured to select the first-level single chip screened out at the time with the least screening quantity as a first-level excellent single chip;
the adjusting module is further configured to replan and configure the control link and the data link of the primary good single chip;
the first checking module is further configured to check a data link of the primary good single chip;
the processing module is further configured to use a primary excellent single chip with a stably connected data link as a secondary single chip, and to implement the function of the MCM multi-chip through the secondary single chip.
8. The multi-chip module fault identification handling system of claim 6, further comprising:
a first shielding module configured to shield a single chip of the at least one single chip, which controls unstable link connection.
9. The multi-chip module fault identification handling system of claim 6, further comprising:
and the first low-power consumption processing module is configured to perform low-power consumption processing on the single chip in which the control link is unstably connected in the at least one single chip.
10. The multi-chip module fault identification handling system of claim 7, further comprising:
and the second checking module is configured to check the control link of the first-stage good single chip before checking the data link of the first-stage good single chip, and if the control link of at least one first-stage good single chip is unstable, the multi-chip module fault identification processing system is ended.
CN202110249196.8A 2021-03-05 2021-03-05 Multi-chip module fault identification processing method and system Active CN113051111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110249196.8A CN113051111B (en) 2021-03-05 2021-03-05 Multi-chip module fault identification processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110249196.8A CN113051111B (en) 2021-03-05 2021-03-05 Multi-chip module fault identification processing method and system

Publications (2)

Publication Number Publication Date
CN113051111A true CN113051111A (en) 2021-06-29
CN113051111B CN113051111B (en) 2022-06-24

Family

ID=76510639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110249196.8A Active CN113051111B (en) 2021-03-05 2021-03-05 Multi-chip module fault identification processing method and system

Country Status (1)

Country Link
CN (1) CN113051111B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080473A (en) * 2022-06-29 2022-09-20 海光信息技术股份有限公司 Multi-chip interconnection system and safe starting method based on same
CN115622666A (en) * 2022-12-06 2023-01-17 北京超摩科技有限公司 Fault channel replacement method for transmission of data link between core particles and core particles
CN116340046A (en) * 2023-05-25 2023-06-27 中诚华隆计算机技术有限公司 Core particle fault detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526581A (en) * 2008-03-07 2009-09-09 佛山市顺德区顺达电脑厂有限公司 Boundary scanning chip failure detection device and method
CN105359468A (en) * 2013-12-06 2016-02-24 英特尔公司 Link transfer, bit error detection and link retry using flit bundles asynchronous to link fabric packets
CN105765544A (en) * 2013-12-26 2016-07-13 英特尔公司 Multichip package link
US20160363626A1 (en) * 2015-06-11 2016-12-15 Altera Corporation Mixed redundancy scheme for inter-die interconnects in a multichip package
CN106932705A (en) * 2015-12-30 2017-07-07 深圳市中兴微电子技术有限公司 A kind of system in package multi-chip interconnects method of testing and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526581A (en) * 2008-03-07 2009-09-09 佛山市顺德区顺达电脑厂有限公司 Boundary scanning chip failure detection device and method
CN105359468A (en) * 2013-12-06 2016-02-24 英特尔公司 Link transfer, bit error detection and link retry using flit bundles asynchronous to link fabric packets
CN105765544A (en) * 2013-12-26 2016-07-13 英特尔公司 Multichip package link
US20160363626A1 (en) * 2015-06-11 2016-12-15 Altera Corporation Mixed redundancy scheme for inter-die interconnects in a multichip package
CN106932705A (en) * 2015-12-30 2017-07-07 深圳市中兴微电子技术有限公司 A kind of system in package multi-chip interconnects method of testing and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RAVI MAHAJAN等: "Embedded Multidie Interconnect Bridge—A Localized, High-Density Multichip Packaging Interconnect", 《IEEE TRANSACTIONS ON COMPONENTS, PACKAGING AND MANUFACTURING TECHNOLOGY》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080473A (en) * 2022-06-29 2022-09-20 海光信息技术股份有限公司 Multi-chip interconnection system and safe starting method based on same
CN115080473B (en) * 2022-06-29 2023-11-21 海光信息技术股份有限公司 Multi-chip interconnection system and safe starting method based on same
CN115622666A (en) * 2022-12-06 2023-01-17 北京超摩科技有限公司 Fault channel replacement method for transmission of data link between core particles and core particles
CN115622666B (en) * 2022-12-06 2023-03-21 北京超摩科技有限公司 Fault channel replacement method for transmission of data link between core particles and core particles
CN116340046A (en) * 2023-05-25 2023-06-27 中诚华隆计算机技术有限公司 Core particle fault detection method and device

Also Published As

Publication number Publication date
CN113051111B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN113051111B (en) Multi-chip module fault identification processing method and system
EP1296154B1 (en) Semiconductor integrated circuit
EP0095928B1 (en) Pipeline processing apparatus having a test function
US20080104461A1 (en) ATE architecture and method for DFT oriented testing
CN109860069B (en) Wafer testing method
JP2003347373A (en) System and method for testing circuit on wafer
US7164283B2 (en) Auto-recovery wafer testing apparatus and wafer testing method
US11114417B2 (en) Through-silicon via (TSV) test circuit, TSV test method and integrated circuits (IC) chip
CN112100085B (en) Android application program stability testing method, device and equipment
US20040123262A1 (en) Automatic placement and routing system
US20070285103A1 (en) Electronic Package and Method for Testing the Same
CN116414639B (en) Test scheduling method and device of chip tester, electronic equipment and storage medium
US6907378B2 (en) Empirical data based test optimization method
CN112147482A (en) Parallel test system and test method thereof
Babaei et al. Online-structural testing of routers in network on chip
US10816962B2 (en) Process control device, manufacturing device, process control method, control program, and recording medium
US9494650B2 (en) Efficient method of retesting integrated circuits
US8205117B2 (en) Migratory hardware diagnostic testing
JP3710639B2 (en) Semiconductor device
CN108804311A (en) A kind of method and device executing test file
JP3748823B2 (en) Electrical inspection system and inspection method for semiconductor device
US20040030978A1 (en) Semiconductor integrated circuit device having operation test function
Mohan et al. Efficient test scheduling for reusable BIST in 3D stacked ICs
CN117074908A (en) Test vector reordering method and system suitable for three-dimensional packaging chip
CN114546614A (en) Expandable ATE test flow scheduling architecture and test method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant