CN113419888A - PCIe link repair method, device, equipment and storage medium - Google Patents

PCIe link repair method, device, equipment and storage medium Download PDF

Info

Publication number
CN113419888A
CN113419888A CN202110705108.0A CN202110705108A CN113419888A CN 113419888 A CN113419888 A CN 113419888A CN 202110705108 A CN202110705108 A CN 202110705108A CN 113419888 A CN113419888 A CN 113419888A
Authority
CN
China
Prior art keywords
pcie link
target
target pcie
fatal
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110705108.0A
Other languages
Chinese (zh)
Inventor
李长飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110705108.0A priority Critical patent/CN113419888A/en
Publication of CN113419888A publication Critical patent/CN113419888A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test buses, lines or interfaces, e.g. stuck-at or open line faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a PCIe link repairing method, which considers that the time distribution of non-fatal errors can reflect the serious degree of problems of a PCIe link, so that the method can acquire the occurrence time of all non-fatal errors recorded at this time and before when the target PCIe link has the non-fatal errors, and execute a repairing program of the target PCIe link once the occurrence time meets the preset time distribution condition so as to restore the target PCIe link to a normal state. The invention also discloses a device and equipment for repairing the PCIe link and a computer readable storage medium, which have the same beneficial effects as the repairing method of the PCIe link.

Description

PCIe link repair method, device, equipment and storage medium
Technical Field
The present invention relates to the field of PCIe links, and in particular, to a method for repairing a PCIe link, and further, to a device, an apparatus, and a computer-readable storage medium for repairing a PCIe link.
Background
In recent years, PCIe (Peripheral Component Interconnect express) devices are widely used in the fields of servers and storage, various non-fatal errors (usually expressed as packet errors) may occur in a PCIe link between a motherboard and a PCIe device, and the present PCIe device basically has an error correction capability for the non-fatal errors and can record the occurring non-fatal errors, but in the prior art, only the non-fatal errors themselves are corrected, and actually, after the non-fatal errors frequently occur, a serious irreparable failure often occurs in the PCIe link very soon, and even a system crash may occur, so the reliability of system operation is poor, and the maintenance cost is high.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a PCIe link repair method, which can reduce the number of non-fatal errors on one hand, and prevent unrepairable serious faults on the other hand, thereby improving the reliability of system operation and reducing the maintenance cost; another object of the present invention is to provide a PCIe link repair apparatus, device, and computer-readable storage medium, which can reduce the number of non-fatal errors, and prevent the occurrence of irreparable serious failures, thereby improving the reliability of system operation and reducing the maintenance cost.
In order to solve the above technical problem, the present invention provides a PCIe link repair method, including:
when a target PCIe link has a non-fatal error, acquiring the occurrence time of the non-fatal error;
acquiring the occurrence time of all the non-fatal errors of the target PCIe link recorded before;
judging whether the occurrence time of all the target PCIe links recorded currently meets a preset time distribution condition or not;
and if so, executing the repair program of the target PCIe link.
Preferably, when a non-fatal error occurs in the target PCIe link, the acquiring the occurrence time of the current non-fatal error specifically includes:
and when the PCIe equipment corresponding to the target PCIe link changes the register value of the non-fatal error, acquiring and recording the occurrence time of the non-fatal error.
Preferably, the preset time distribution condition includes:
there is a first preset number of said non-fatal errors occurring within a first preset duration; the number of the time counting units in which the non-fatal errors occur in a second preset time length reaches a second preset number;
the time length of the time counting units is a third preset time length, all the time counting units are continuous and have no intersection, and the second preset time length is longer than the first preset time length;
if yes, after executing the repair program of the target PCIe link, the method for repairing the PCIe link further includes:
and clearing all the occurrence time of the target PCIe link which is recorded currently.
Preferably, the executing the repair program of the target PCIe link specifically includes:
if the non-fatal errors of the first preset number occur within the first preset duration, retraining the target PCIe link;
if the number of the time counting units with the non-fatal errors occurring in a second preset time length reaches a second preset number, judging whether the current data transmission rate of the target PCIe link is greater than a preset threshold value or not;
if so, setting the data transmission rate of the target PCIe link as the preset threshold;
and if not, disabling the target PCIe link.
Preferably, if the PCIe link is not greater than the target PCIe link, the method for repairing the PCIe link further includes:
the control prompter prompts the target PCIe link for a fault.
Preferably, the prompter is a display.
Preferably, the non-fatal errors include transport layer packet errors and data link layer packet errors.
In order to solve the above technical problem, the present invention further provides a PCIe link repair apparatus, including:
the first acquisition module is used for acquiring the occurrence time of the current non-fatal error when the target PCIe link has the non-fatal error;
the second obtaining module is used for obtaining the occurrence time of all the non-fatal errors of the target PCIe link recorded before;
the judging module is used for judging whether the occurrence time of all the target PCIe links recorded at present meets a preset time distribution condition or not, and if so, triggering the executing module;
the execution module is used for executing the repair program of the target PCIe link.
In order to solve the above technical problem, the present invention further provides a PCIe link repair device, including:
a memory for storing a computer program;
a processor for implementing the steps of the repair method for PCIe links as described above when executing the computer program.
To solve the above technical problem, the present invention further provides a computer-readable storage medium, having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the repair method for PCIe links as described above.
The invention provides a PCIe link repairing method, which considers that the time distribution of occurrence of non-fatal errors can reflect the severity of problems of a PCIe link, so that the method can acquire the occurrence time of all non-fatal errors recorded at this time and before when the target PCIe link has the non-fatal errors, and execute a repairing program of the target PCIe link once the occurrence time meets the preset time distribution condition so as to restore the target PCIe link to a normal state.
The invention also provides a device and equipment for repairing the PCIe link and a computer readable storage medium, which have the same beneficial effects as the repairing method of the PCIe link.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a PCIe link repair method provided in the present invention;
FIG. 2 is a schematic structural diagram of a repair apparatus for a PCIe link according to the present invention;
fig. 3 is a schematic structural diagram of a repair device for a PCIe link according to the present invention.
Detailed Description
The core of the invention is to provide a method for repairing PCIe link, which can reduce the number of non-fatal errors on one hand, and can prevent the occurrence of unrepairable serious faults on the other hand, thereby improving the reliability of system operation and reducing the maintenance cost; another core of the present invention is to provide a PCIe link repair apparatus, device, and computer-readable storage medium, which can reduce the number of non-fatal errors, and prevent an unrepairable critical failure, thereby improving the reliability of system operation and reducing the maintenance cost.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a repair method for a PCIe link provided in the present invention, where the repair method for the PCIe link includes:
step S1: when a target PCIe link has a non-fatal error, acquiring the occurrence time of the non-fatal error;
specifically, in view of the technical problems in the background art, in the embodiment of the present invention, the distribution of the occurrence time of each non-fatal error in the target PCIe link is to be analyzed, so that the target PCIe link is repaired under the condition that the probability that the target PCIe link has a serious unrepairable failure is found to be high, so that the target PCIe link is recovered to be normal, the occurrence of the serious unrepairable failure is prevented, and the reliability of system operation is improved.
The target PCIe connection may be a PCIe link between the motherboard and various types of PCIe devices, for example, a link with a graphics card, and the like, which is not limited herein in the embodiment of the present invention.
Step S2: acquiring the occurrence time of all non-fatal errors of a target PCIe link recorded before;
specifically, in the embodiment of the present invention, since the non-fatal error is recorded every time the non-fatal error occurs, at least one occurrence time of the non-fatal error is recorded, the occurrence time of all the previously recorded non-fatal errors of the target PCIe link may be acquired in this step, so that the subsequent step performs analysis of distribution characteristics of the occurrence time of the non-fatal error using all the occurrence times recorded at present as a data basis.
The execution main body in the embodiment of the present invention may be a CPU on a motherboard, and the like, and the embodiment of the present invention is not limited herein.
Step S3: judging whether all occurrence time of the target PCIe link recorded currently meets a preset time distribution condition or not;
specifically, in the embodiment of the present invention, a time distribution condition is preset, which may be a condition indicating that a target PCIe link has frequent non-fatal errors and has a high probability of having an unrepairable serious failure.
Step S4: and if so, executing the repair program of the target PCIe link.
Specifically, when the preset time distribution condition is met, it indicates that the frequency of the non-fatal errors occurring in the target PCIe link has reached the dangerous standard, so that the target PCIe link may generate a serious failure that cannot be repaired at any time, and at this time, the repair program of the target PCIe link may be executed, so that the target PCIe link is recovered to be normal, thereby improving the reliability of system operation and reducing the frequency of the non-fatal errors.
The invention provides a PCIe link repairing method, which considers that the time distribution of occurrence of non-fatal errors can reflect the severity of problems of a PCIe link, so that the method can acquire the occurrence time of all non-fatal errors recorded at this time and before when the target PCIe link has the non-fatal errors, and execute a repairing program of the target PCIe link once the occurrence time meets the preset time distribution condition so as to restore the target PCIe link to a normal state.
On the basis of the above-described embodiment:
as a preferred embodiment, when a non-fatal error occurs in a target PCIe link, acquiring the occurrence time of the current non-fatal error specifically includes:
and when the PCIe equipment corresponding to the target PCIe link changes the register value of the non-fatal error, acquiring and recording the occurrence time of the non-fatal error.
Specifically, considering that the PCIe device itself records the non-fatal error in the target PCIe link in the register, for the CPU, the CPU can directly read the register value in the register to determine whether the non-fatal error occurs and the occurrence time corresponding to the non-fatal error, and the implementation is simple.
Specifically, it should be noted that the frequency of monitoring the register value may be set autonomously, for example, the frequency may be monitored every 200ms, and once the register value changes, it represents that a non-fatal error occurs in the target PCIe link, which is not limited herein.
Of course, in addition to this implementation manner, the "obtaining the occurrence time of the current non-fatal error when the target PCIe link has the non-fatal error" may also be implemented in other specific manners, and the embodiment of the present invention is not limited herein.
As a preferred embodiment, the preset time distribution condition includes:
the method comprises the steps that a first preset number of non-fatal errors occur within a first preset time length; the number of the time counting units with non-fatal errors occurring in the second preset time length reaches a second preset number;
the time length of the time counting units is a third preset time length, all the time counting units are continuous and have no intersection, and the second preset time length is larger than the first preset time length;
if yes, after executing the repair program of the target PCIe link, the repair method of the PCIe link further includes:
and clearing all occurrence times of the target PCIe link which is recorded currently.
Specifically, the first time distribution condition is that "there are a first preset number of non-fatal errors occurring within a first preset duration", for example, 4 errors occurring within 20s, and this condition may determine that a large number of non-fatal errors are concentrated within a short time, which represents a possible serious failure of the target PCIe link.
Specifically, the second time distribution condition is that "the number of time counting units in which the non-fatal error occurs within the second preset time period reaches the second preset number", for example, the second time distribution condition may be: "20 s as a time statistic unit, the number of time statistic units in which the non-fatal error occurred within 1 hour reaches 50", and this condition can determine that the non-fatal error continues to occur at a certain frequency for a long time (second preset time), which represents that the non-fatal error has not been accidental, but may be a regularly occurring non-fatal error due to the target PCIe link itself, which also represents that the target PCIe link may have a serious failure at any time.
Of course, the preset time distribution condition may include other specific conditions besides the two specific conditions, and the embodiment of the present invention is not limited herein.
As a preferred embodiment, the executing the repair procedure of the target PCIe link specifically includes:
if the non-fatal errors with the first preset number occur within the first preset duration, retraining the target PCIe link;
if the number of the time counting units with the non-fatal errors in the second preset time length reaches a second preset number, judging whether the current data transmission rate of the target PCIe link is greater than a preset threshold value or not;
if so, setting the data transmission rate of the target PCIe link as a preset threshold value;
if not, the target PCIe link is disabled.
In particular, considering that the occurrence of the condition corresponding to the first time distribution condition is probably caused by an instantaneous sudden failure of the target PCIe Link, and the relation between the condition and the data transmission rate may not be large, the target PCIe Link may be retrained (disabled Link disables the target PCIe Link and then enabled Link enables the target PCIe Link) when the condition corresponding to the first time distribution condition occurs, so that the target PCIe Link can be recovered to be normal.
Specifically, considering that the higher the data transmission rate is, the greater the probability of occurrence of the non-fatal error is, and the occurrence of the situation corresponding to the second time distribution condition may be caused by the excessively fast data transmission rate, therefore, when the situation corresponding to the second time distribution condition occurs, in the embodiment of the present invention, it may be determined whether the current data transmission rate of the target PCIe link is greater than the preset threshold, and if so, the data transmission rate of the target PCIe link may be set to the preset threshold, so as to attempt to reduce the frequency of occurrence of the non-fatal error, but if not, the occurrence of the situation corresponding to the second time distribution condition may not be caused by the excessively high data transmission rate, and at this time, the target PCIe link may be directly disabled for safety.
The preset threshold may be set autonomously, for example, may be 2.5GT/s, and the embodiment of the present invention is not limited herein.
As a preferred embodiment, if the PCIe link is not greater than the target PCIe link, after disabling the target PCIe link, the method for repairing the PCIe link further includes:
the control prompter prompts the target PCIe link for a fault.
Specifically, in order to facilitate workers to find the fault condition of the target PCIe link in time and take countermeasures to recover the system as soon as possible, in the embodiment of the present invention, the prompter may be controlled to prompt that the target PCIe link has a fault after the target PCIe link is disabled.
In a preferred embodiment, the indicator is a display.
Specifically, the display is hardware which is provided by the server or the storage system, so that the cost can be saved, and the prompting effect is good.
Of course, besides the display, the prompter may be of other various types, and the embodiment of the present invention is not limited herein.
As a preferred embodiment, the non-fatal errors include transport layer packet errors and data link layer packet errors.
Specifically, the transport layer packet error and the data link layer packet error are two non-fatal errors with high occurrence probability.
Of course, the non-fatal errors may include other types besides the two non-fatal errors, and the embodiments of the present invention are not limited herein.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a repair apparatus for a PCIe link according to the present invention, where the repair apparatus for a PCIe link includes:
the first obtaining module 21 is configured to obtain occurrence time of the current non-fatal error when a target PCIe link has a non-fatal error;
a second obtaining module 22, configured to obtain occurrence times of all non-fatal errors of the target PCIe link recorded before;
the judging module 23 is configured to judge whether all occurrence times of the currently recorded target PCIe link meet a preset time distribution condition, and if yes, trigger the executing module;
and the execution module 24 is used for executing the repair program of the target PCIe link.
For the description of the repair apparatus for a PCIe link according to the embodiment of the present invention, please refer to the foregoing embodiment of the repair method for a PCIe link, which is not described herein again.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a repair device for a PCIe link according to the present invention, where the repair device for a PCIe link includes:
a memory 31 for storing a computer program;
a processor 32 for implementing the steps of the repair method for PCIe links as in the foregoing embodiments when executing the computer program.
For introducing the repair device for the PCIe link provided in the embodiment of the present invention, please refer to the foregoing embodiment of the repair method for the PCIe link, which is not described herein again.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the repair method for PCIe links as in the preceding embodiments.
For the description of the repair apparatus for a PCIe link according to the embodiment of the present invention, please refer to the foregoing embodiment of the repair method for a PCIe link, which is not described herein again.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for repairing a PCIe link, comprising:
when a target PCIe link has a non-fatal error, acquiring the occurrence time of the non-fatal error;
acquiring the occurrence time of all the non-fatal errors of the target PCIe link recorded before;
judging whether the occurrence time of all the target PCIe links recorded currently meets a preset time distribution condition or not;
and if so, executing the repair program of the target PCIe link.
2. The method for repairing a PCIe link according to claim 1, wherein, when a non-fatal error occurs in the target PCIe link, the acquiring the occurrence time of the current non-fatal error specifically includes:
and when the PCIe equipment corresponding to the target PCIe link changes the register value of the non-fatal error, acquiring and recording the occurrence time of the non-fatal error.
3. The repair method for the PCIe link according to claim 2, wherein the preset time distribution condition comprises:
there is a first preset number of said non-fatal errors occurring within a first preset duration; the number of the time counting units in which the non-fatal errors occur in a second preset time length reaches a second preset number;
the time length of the time counting units is a third preset time length, all the time counting units are continuous and have no intersection, and the second preset time length is longer than the first preset time length;
if yes, after executing the repair program of the target PCIe link, the method for repairing the PCIe link further includes:
and clearing all the occurrence time of the target PCIe link which is recorded currently.
4. The method of claim 3, wherein the executing the repair procedure for the target PCIe link is specifically:
if the non-fatal errors of the first preset number occur within the first preset duration, retraining the target PCIe link;
if the number of the time counting units with the non-fatal errors occurring in a second preset time length reaches a second preset number, judging whether the current data transmission rate of the target PCIe link is greater than a preset threshold value or not;
if so, setting the data transmission rate of the target PCIe link as the preset threshold;
and if not, disabling the target PCIe link.
5. The method of claim 4, wherein if not, after disabling the target PCIe link, the method further comprises:
the control prompter prompts the target PCIe link for a fault.
6. The method of repairing a PCIe link of claim 5, wherein the prompter is a display.
7. The method of repairing a PCIe link according to any of claims 1-6, wherein the non-fatal errors include transport layer packet errors and data link layer packet errors.
8. A repair apparatus for a PCIe link, comprising:
the first acquisition module is used for acquiring the occurrence time of the current non-fatal error when the target PCIe link has the non-fatal error;
the second obtaining module is used for obtaining the occurrence time of all the non-fatal errors of the target PCIe link recorded before;
the judging module is used for judging whether the occurrence time of all the target PCIe links recorded at present meets a preset time distribution condition or not, and if so, triggering the executing module;
the execution module is used for executing the repair program of the target PCIe link.
9. A repair device for a PCIe link, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the PCIe link repair method as recited in any one of claims 1 to 7 when said computer program is executed.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the repair method for a PCIe link according to any one of claims 1 to 7.
CN202110705108.0A 2021-06-24 2021-06-24 PCIe link repair method, device, equipment and storage medium Withdrawn CN113419888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110705108.0A CN113419888A (en) 2021-06-24 2021-06-24 PCIe link repair method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110705108.0A CN113419888A (en) 2021-06-24 2021-06-24 PCIe link repair method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113419888A true CN113419888A (en) 2021-09-21

Family

ID=77717626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110705108.0A Withdrawn CN113419888A (en) 2021-06-24 2021-06-24 PCIe link repair method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113419888A (en)

Similar Documents

Publication Publication Date Title
US10430260B2 (en) Troubleshooting method, computer system, baseboard management controller, and system
CN111008091A (en) Fault processing method, system and related device for memory CE
CN108154230B (en) Monitoring method and monitoring device of deep learning processor
CN108958965B (en) Method, device and equipment for monitoring recoverable ECC errors by BMC
US20220129338A1 (en) Chip Fault Diagnosis Method, Chip Fault Diagnosis Device, Computer-Readable Storage Medium and Electronic Equipment
CN114676019B (en) Method, device, equipment and storage medium for monitoring state of central processing unit
CN115981898A (en) Error-correctable error processing method, device and equipment for memory and readable storage medium
KR101936240B1 (en) Preventive maintenance simulation system and method
CN106201753B (en) Method and system for processing PCIE errors in linux
CN116820820A (en) Server fault monitoring method and system
CN117076186B (en) Memory fault detection method, system, device, medium and server
EP3358467A1 (en) Fault processing method, computer system, baseboard management controller and system
KR102213676B1 (en) Terminal apparatus for autosar system with arithmetic operation supervision function and arithmetic operation supervision method of autosar system
CN113419888A (en) PCIe link repair method, device, equipment and storage medium
CN111124818B (en) Monitoring method, device and equipment for Expander
CN111159051B (en) Deadlock detection method, deadlock detection device, electronic equipment and readable storage medium
CN110908839A (en) Method, device and equipment for relieving fault of logic module
CN112763813A (en) Apparatus and method for detecting cause of battery discharge of vehicle
CN115904772A (en) Error determination method, device, equipment and storage medium for PCIe link
CN113625957B (en) Method, device and equipment for detecting hard disk faults
CN114328141A (en) Hard disk fault early warning method and related components
CN112799911A (en) Node health state detection method, device, equipment and storage medium
CN112256539A (en) PCIE link error statistical method, device, terminal and storage medium
CN111124729A (en) Fault disk determination method, device, equipment and computer readable storage medium
CN111949485A (en) SAS port monitoring method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210921