CN112256539B - PCIE link error statistical method, device, terminal and storage medium - Google Patents

PCIE link error statistical method, device, terminal and storage medium Download PDF

Info

Publication number
CN112256539B
CN112256539B CN202010990038.3A CN202010990038A CN112256539B CN 112256539 B CN112256539 B CN 112256539B CN 202010990038 A CN202010990038 A CN 202010990038A CN 112256539 B CN112256539 B CN 112256539B
Authority
CN
China
Prior art keywords
fatal error
error count
pcie link
change
fatal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010990038.3A
Other languages
Chinese (zh)
Other versions
CN112256539A (en
Inventor
李长飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010990038.3A priority Critical patent/CN112256539B/en
Publication of CN112256539A publication Critical patent/CN112256539A/en
Application granted granted Critical
Publication of CN112256539B publication Critical patent/CN112256539B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/349Performance evaluation by tracing or monitoring for interfaces, buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Abstract

The invention discloses a PCIE link error statistical method, a device, a terminal and a storage medium, which can monitor the non-fatal error count of a PCIE link in real time; when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted; when N sections of changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second section number threshold, and the number of times of changes of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time period is a segment change. The invention carries out statistics in two dimensions of error quantity and error generation time, and when the generated errors meet the statistical conditions, alarms are generated or the link is interrupted, thereby avoiding serious system faults caused by excessive errors and greatly improving the stability and reliability of system operation.

Description

PCIE link error statistical method, device, terminal and storage medium
Technical Field
The invention relates to the field of PCIE link monitoring, in particular to a PCIE link error statistical method, a device, a terminal and a storage medium.
Background
In recent years, with the continuous improvement of user requirements for integration, unification, efficiency, space and energy consumption, PCIE (peripheral component interconnect express) devices are widely applied in the fields of servers and storage, so that the health state of a PCIE link can be effectively monitored, and a security protection policy is adopted according to the monitoring condition, so as to improve the stability and reliability of system operation. At present, most of various PCIE devices provide error data, how to effectively utilize the data to determine the health state of a link is a difficulty in the field, and an effective error statistical method is not available at present.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method, an apparatus, a terminal and a storage medium for PCIE link error statistics, which perform reasonable statistics on non-fatal errors on a PCIE link to avoid a system serious failure caused by too many errors.
The technical scheme of the invention is as follows: a PCIE link error statistical method comprises the following steps:
monitoring the non-fatal error count of the PCIE link in real time;
when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted;
when N-segment changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second-segment number threshold, and the non-fatal error count change times do not exceed a first time threshold every time the non-fatal error count changes continuously, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time period is a one-segment change.
Further, the method also comprises the following steps:
applying for a plurality of object pools; the number of the object pools is the same as the threshold value of the second segment number;
when monitoring that the non-fatal wrong counting is changed, recording monitoring information in a corresponding object pool;
if the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool;
if the time interval between the next non-fatal error counting change and the last non-fatal error counting change is larger than a first preset time length, moving to the next object pool to record monitoring information, and using each object pool according to the sequencing cycle of the object pools in a covering mode;
if all the object pools are used in a covering manner within the second preset time length, it means that when it is monitored that the non-fatal error count within the second preset time length changes by N segments, where N exceeds the threshold of the second segment number, and the number of times of change of the non-fatal error count does not exceed the threshold of the first number of times in each continuous change, an alarm is sent out and/or the PCIE link is interrupted.
Further, the recorded monitoring information includes: the time when the non-fatal error count changes, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes are monitored last time.
Further, the non-fatal error counts include data link layer packet error counts and transport layer packet error counts.
The technical scheme of the invention also comprises a PCIE link error statistical device, which comprises,
a counting monitoring module: monitoring the non-fatal error count of the PCIE link in real time;
a first exception handling module: when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted;
a second exception handling module: when N sections of changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second section number threshold, and the number of times of changes of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time period is a one-segment change.
Further, the method also comprises the following steps of,
an object pool application module: applying for a plurality of object pools; the number of the object pools is the same as the threshold value of the second segment number;
monitoring information record module: when monitoring that the non-fatal error count changes, recording monitoring information in a corresponding object pool; if the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool; if the time interval between the next non-fatal error counting change and the last non-fatal error counting change is larger than a first preset time length, moving to the next object pool to record monitoring information, and circularly covering and using each object pool according to the sequence of the object pools;
and the second exception handling module monitors whether all the object pools are used in a covering mode within a second preset time length, if so, the second exception handling module indicates that the non-fatal error count within the second preset time length is monitored to be changed by N sections, N exceeds a second section number threshold, and the number of times of change of the non-fatal error count does not exceed a first time threshold every time the non-fatal error count is continuously changed, and then an alarm is sent out and/or the PCIE link is interrupted.
Further, the recorded monitoring information includes: the time when the non-fatal error count changes, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes are monitored last time.
Further, the non-fatal error counts include data link layer packet error counts and transport layer packet error counts.
The technical scheme of the invention also comprises a terminal, which comprises:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform any of the methods described above.
The invention also comprises a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method as defined in any one of the above.
According to the PCIE link error statistical method, the device, the terminal and the storage medium, provided by the invention, non-fatal errors occurring on a link are effectively and reasonably counted, statistics is carried out in two dimensions of error quantity and error generation time, when the generated errors meet statistical conditions, an alarm is generated or the link is interrupted, serious system faults caused by excessive errors are avoided, and the stability and the reliability of system operation are greatly improved. The method fills the blank in the aspect of PCIE link error statistics, does not distinguish PCIE equipment, and has wider applicability and strong universality.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an object pool architecture according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a second structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
As shown in fig. 1, the present embodiment provides a PCIE link error statistical method, including the following steps:
s1, monitoring the non-fatal error count of the PCIE link in real time;
s2, when it is monitored that the non-fatal error count continuously changes within a first preset time period and the change times exceed a first time threshold, an alarm is sent and/or the PCIE link is interrupted;
s3, when it is monitored that the non-fatal error count changes by N segments within a second preset duration, N exceeds a second segment number threshold, and the number of times of change of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent and/or the PCIE link is interrupted; the change within the first preset time period is a segment change.
The method comprises the steps that a first preset time length, a first time threshold value, a second preset time length and a second time threshold value are set, when counting (referring to non-fatal error counting) is changed, if the counting is continuously changed within the first preset time length and the continuous change time exceeds a first time threshold value, the occurrence of abnormity is indicated, and abnormity processing is needed; in addition, if the threshold segment change exceeding the second segment number occurs within the second preset time length, the occurrence of the exception is also indicated, and the exception needs to be handled. It should be noted that each segment of change may be changed only once, or may include multiple continuous changes, and if the change is continuous, the number of times of the non-fatal error count change does not exceed the first number threshold, otherwise, an alarm may be issued and/or the PCIE link may be interrupted when the second preset time is not reached.
By the method, non-fatal errors (Uncoreactable errors) occurring on the link are reasonably and effectively counted, two dimensions of the Error quantity and the Error generation time are counted, when the generated errors meet the counting conditions, an alarm is generated or the link is interrupted, and serious system faults caused by excessive errors are avoided.
In this embodiment, the non-fatal-error counting monitoring information is stored and counted through the object pools, and a plurality of object pools are first applied, where the number of the object pools is the same as the threshold of the second segment number, so as to count the number of counting change segments within the second preset duration. The object pools are sorted according to the serial numbers and are cyclically covered for use, for example, monitoring information monitored at a first section is stored in a first object pool, monitoring information monitored at a second section is stored in a second object pool, if M object pools exist, monitoring information monitored at a M +1 section is stored in the first object pool and information before the first object pool is covered if the M section count is changed.
And when the non-fatal error count is monitored to be changed, recording monitoring information in the corresponding object pool. Wherein the monitoring information includes: the time when the non-fatal error count changes, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes are monitored last time.
If the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool; and if the time interval between the next non-fatal error count change and the last non-fatal error count change is larger than a first preset time length, moving to the next object pool to record monitoring information, and circularly covering and using each object pool according to the sequence of the object pools.
Based on this, if all the object pools are used in a covering manner within the second preset time length, it indicates that when it is monitored that the non-fatal error count within the second preset time length changes in N segments, where N exceeds the second segment number threshold, and the number of times of change of the non-fatal error count does not exceed the first time threshold each time the non-fatal error count changes continuously, an alarm is sent and/or the PCIE link is interrupted.
A specific implementation is provided below to further understand the present solution.
As shown in fig. 2, in this specific implementation, 51 object pools are set, the first preset time is 20 seconds, the second preset time is 1 hour, the first time threshold is 4 times, and the second time threshold is 51 times.
In addition, the non-fatal error Count includes a Data Link Layer Packet error Count (Bad DLLP Count) and a transport Layer Packet error Count (Bad Transaction Layer Packet Count).
Applying for 51 statistical object pools in total from 0 to 50 by the system, wherein the 51 object pools are used in a circulating and covering manner and are used from zero; each object pool has a description of the error, the description contents of which are: current error Count time (i.e., the time when the non-fatal error Count change was last monitored), Bad TLP Count, Bad DLLP Count, and the number of statistical changes (i.e., the number of times the non-fatal error Count of the segment changed).
The statistical process is as follows:
(1) the system circularly reads the non-fatal error Count values (BadTLPCount and Bad DLLP Count) on the PCIE link, and when any one of the two changes (namely the Count changes), statistics is carried out;
(2) if the continuous counting changes within 20s, updating in the current object pool;
(3) and when the time between the current counting change and the last counting change is more than 20s, moving to the next object pool for counting.
When the count changes for 4 times continuously in 20s or 51 object pools are used in 1 hour, it indicates that an exception occurs, and an alarm may be sent and/or the PCIE link may be interrupted.
It should be noted that, the statistics is performed in the normal stage of the link, and the statistics is not performed in the process of plugging and unplugging the device and the process of changing the link. In addition, if 51 object pools are used up within 1 hour, for the convenience of statistics, the loop coverage can be stopped, that is, the monitoring information is stopped being stored, and an alarm is sent and/or the PCIE link is interrupted.
Example two
As shown in fig. 3, on the basis of the first embodiment, the present embodiment provides a PCIE link error statistics apparatus, which includes the following functional modules.
The count monitoring module 101: monitoring the non-fatal error count of the PCIE link in real time;
the first exception handling module 102: when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted;
the second exception handling module 103: when N sections of changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second section number threshold, and the number of times of changes of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time length is changed by one section;
object pool application module 104: applying for a plurality of object pools; the number of the object pools is the same as the threshold value of the second section number;
the monitoring information recording module 105: when monitoring that the non-fatal wrong counting is changed, recording monitoring information in a corresponding object pool; if the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool; and if the time interval between the next non-fatal error count change and the last non-fatal error count change is larger than a first preset time length, moving to the next object pool to record monitoring information, and circularly covering and using each object pool according to the sequence of the object pools.
And the second exception handling module monitors whether all the object pools are used in a covering mode within a second preset time length, if so, the second exception handling module indicates that the non-fatal error count within the second preset time length is monitored to be changed by N sections, N exceeds a second section number threshold, and the number of times of change of the non-fatal error count does not exceed a first time threshold every time the non-fatal error count is continuously changed, and then an alarm is sent out and/or the PCIE link is interrupted.
Wherein the recorded monitoring information comprises: the time when the non-fatal error count changes, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes are monitored last time. The non-fatal error counts include data link layer packet error counts and transport layer packet error counts.
EXAMPLE III
The present embodiments provide a terminal that includes a processor and a memory.
The memory is used for storing the execution instructions of the processor. The memory may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. The executable instructions in the memory, when executed by the processor, enable the terminal to perform some or all of the steps in the above-described method embodiments.
The processor is a control center of the storage terminal, connects various parts of the whole electronic terminal by using various interfaces and lines, and executes various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions.
Example four
The present embodiment provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided in the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
The above disclosure is only for the preferred embodiments of the present invention, but the present invention is not limited thereto, and any non-inventive changes that can be made by those skilled in the art and several modifications and amendments made without departing from the principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A PCIE link error statistical method is characterized by comprising the following steps:
monitoring the non-fatal error count of the PCIE link in real time;
when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted;
when N sections of changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second section number threshold, and the number of times of changes of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time period is a one-segment change.
2. The PCIE link error statistic method of claim 1, further comprising the steps of:
applying for a plurality of object pools; the number of the object pools is the same as the threshold value of the second segment number;
when monitoring that the non-fatal error count changes, recording monitoring information in a corresponding object pool;
if the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool;
if the time interval between the next non-fatal error counting change and the last non-fatal error counting change is larger than a first preset time length, moving to the next object pool to record monitoring information, and circularly covering and using each object pool according to the sequence of the object pools;
if all the object pools are used in a covering manner within the second preset time length, it means that when it is monitored that the non-fatal error count within the second preset time length changes by N segments, where N exceeds the threshold of the second segment number, and the number of times of change of the non-fatal error count does not exceed the threshold of the first number of times in each continuous change, an alarm is sent out and/or the PCIE link is interrupted.
3. The PCIE link error statistics method of claim 2, wherein the recorded monitoring information comprises: the time when the non-fatal error count changes is monitored last time, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes for the segment.
4. The PCIE link error statistic method of claim 1, 2 or 3 wherein the non-fatal error counts include data link layer packet error counts and transport layer packet error counts.
5. A PCIE link error statistic device is characterized in that it includes,
a counting monitoring module: monitoring the non-fatal error count of the PCIE link in real time;
a first exception handling module: when the monitored non-fatal error count is continuously changed within a first preset time period, and the change times exceed a first time threshold value, an alarm is sent out and/or the PCIE link is interrupted;
a second exception handling module: when N sections of changes of the non-fatal error count within a second preset duration are monitored, N exceeds a second section number threshold, and the number of times of changes of the non-fatal error count does not exceed a first time threshold in each continuous change, an alarm is sent out and/or the PCIE link is interrupted; the change within the first preset time period is a segment change.
6. The PCIE link error statistics apparatus of claim 5, further comprising,
an object pool application module: applying for a plurality of object pools; the number of the object pools is the same as the threshold value of the second segment number;
monitoring information record module: when monitoring that the non-fatal error count changes, recording monitoring information in a corresponding object pool; if the non-fatal error count is continuously changed within the first preset duration, continuously updating the monitoring information in the current object pool; if the time interval between the next non-fatal error counting change and the last non-fatal error counting change is larger than a first preset time length, moving to the next object pool to record monitoring information, and using each object pool according to the sequencing cycle of the object pools in a covering mode;
and the second exception handling module monitors whether all the object pools are used in a covering mode within a second preset time length, if so, the second exception handling module indicates that the non-fatal error count within the second preset time length is monitored to be changed by N sections, N exceeds a second section number threshold, and the number of times of change of the non-fatal error count does not exceed a first time threshold every time the non-fatal error count is continuously changed, and then an alarm is sent out and/or the PCIE link is interrupted.
7. The PCIE link error statistic device of claim 6, wherein the recorded monitoring information includes: the time when the non-fatal error count changes, the latest numerical value of the non-fatal error count and the number of times the non-fatal error count changes are monitored last time.
8. The PCIE link error statistics apparatus of claim 5, 6 or 7, wherein the non-fatal error counts comprise data link layer packet error counts and transport layer packet error counts.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202010990038.3A 2020-09-18 2020-09-18 PCIE link error statistical method, device, terminal and storage medium Active CN112256539B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010990038.3A CN112256539B (en) 2020-09-18 2020-09-18 PCIE link error statistical method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010990038.3A CN112256539B (en) 2020-09-18 2020-09-18 PCIE link error statistical method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN112256539A CN112256539A (en) 2021-01-22
CN112256539B true CN112256539B (en) 2022-07-19

Family

ID=74232332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010990038.3A Active CN112256539B (en) 2020-09-18 2020-09-18 PCIE link error statistical method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN112256539B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448785B (en) * 2021-05-28 2023-03-28 山东英信计算机技术有限公司 Method, device and equipment for processing bandwidth state exception and readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130339825A1 (en) * 2012-06-13 2013-12-19 International Business Machines Corporation External settings that reconfigure the error handling behavior of a distributed pcie switch
CN106201753A (en) * 2016-06-28 2016-12-07 浪潮(北京)电子信息产业有限公司 A kind of based on the processing method of PCIE mistake in linux and system
US20180095817A1 (en) * 2015-09-11 2018-04-05 Huawei Technologies Co., Ltd. Method and apparatus for disconnecting link between pcie device and host
CN110532120A (en) * 2019-07-28 2019-12-03 苏州浪潮智能科技有限公司 The method and apparatus of PCIe not correctable error in monitoring server system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130339825A1 (en) * 2012-06-13 2013-12-19 International Business Machines Corporation External settings that reconfigure the error handling behavior of a distributed pcie switch
US20180095817A1 (en) * 2015-09-11 2018-04-05 Huawei Technologies Co., Ltd. Method and apparatus for disconnecting link between pcie device and host
CN106201753A (en) * 2016-06-28 2016-12-07 浪潮(北京)电子信息产业有限公司 A kind of based on the processing method of PCIE mistake in linux and system
CN110532120A (en) * 2019-07-28 2019-12-03 苏州浪潮智能科技有限公司 The method and apparatus of PCIe not correctable error in monitoring server system

Also Published As

Publication number Publication date
CN112256539A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
US6944796B2 (en) Method and system to implement a system event log for system manageability
US6012148A (en) Programmable error detect/mask utilizing bus history stack
CN104426696B (en) A kind of method of troubleshooting, server and system
CN105117301A (en) Memory warning method and apparatus
CN112256539B (en) PCIE link error statistical method, device, terminal and storage medium
CN107222497B (en) Network flow abnormity monitoring method and electronic equipment
CN106502811A (en) A kind of 1553B bus communications fault handling method
CN108509322A (en) Avoid the method excessively paid a return visit, electronic device and computer readable storage medium
CN111785315A (en) Method, system, storage medium and terminal for reducing erasing interference and erasing time
EP3358467A1 (en) Fault processing method, computer system, baseboard management controller and system
CN110489260A (en) Fault recognition method, device and BMC
CN109254898B (en) Software module execution sequence monitoring method and system
CN114448689B (en) Method, device, equipment and storage medium for determining boundary equipment of industrial control network
WO2022057373A1 (en) Dual-port disk management method, apparatus and terminal, and storage medium
CN111131198B (en) Updating method and device for network security policy configuration
CN114281250A (en) Method and device for cleaning storage file, storage medium and electronic device
CN110460723A (en) Screen protection method, device, computer equipment and computer readable storage medium
CN117608910B (en) Determination method, device and system for machine inspection exception error type of processor
CN109885402B (en) Method for testing function output data overflow, terminal equipment and storage medium
CN115312115B (en) Method, device, equipment and medium for verifying suspend function of flash memory chip
CN116909625B (en) Command queue analysis method and device, electronic equipment and storage medium
Subramanian et al. Fault mitigation in safety-critical software systems
CN116431377B (en) Watchdog circuit
EP4256354B1 (en) Safety mechanisms for artificial intelligence units used in safety critical applications
CN115421954A (en) Memory error reporting funnel method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant