CN110532120A - The method and apparatus of PCIe not correctable error in monitoring server system - Google Patents

The method and apparatus of PCIe not correctable error in monitoring server system Download PDF

Info

Publication number
CN110532120A
CN110532120A CN201910685722.8A CN201910685722A CN110532120A CN 110532120 A CN110532120 A CN 110532120A CN 201910685722 A CN201910685722 A CN 201910685722A CN 110532120 A CN110532120 A CN 110532120A
Authority
CN
China
Prior art keywords
pcie
correctable error
corresponding document
return value
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910685722.8A
Other languages
Chinese (zh)
Inventor
张建业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910685722.8A priority Critical patent/CN110532120A/en
Publication of CN110532120A publication Critical patent/CN110532120A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Abstract

The invention discloses a kind of methods of the not correctable error of PCIe in monitoring server system.In the initial time in scan cycle period, the status information that the state of whole PCIe devices to server system carries out each PCIe device of traversal acquisition is stored in respectively among corresponding document;The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;It wherein, include the type information for the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail.The above method can monitor the PCIe device and type of error for PCIe not correctable error occur.Also disclose the device of the not correctable error of PCIe in corresponding monitoring server system.

Description

The method and apparatus of PCIe not correctable error in monitoring server system
Technical field
The present invention relates to server monitoring technology, the side of PCIe not correctable error in espespecially a kind of monitoring server system Method and device.
Background technique
PCIe (Peripheral Component Interconnect express) is a kind of high speed serialization computer expansion Open up bus standard, it is intended to substitute the pci bus of old version, present most of mainboards are equipped with multiple PCIe slots, and external equipment can To be inserted into PCIe slot, communicated by PCIe bus with host.It is most of external in the server design of current main-stream Equipment is all that the equipment (referred to herein simply as " PCIe device ") of server system, such as net are connected to by PCIe slot Card, RAID (Redundant Array ofIndependent Disks, raid-array) card and HBA (Host Bus Adapter, host bus adaptor) card etc., in addition there are also some NVM Express (Non-Volatile Memory Express, Nonvolatile memory host controller interface specification) and system under bridging device etc..Under linux system, when When these equipment have not correctable error (Uncorrectable errors), system manager has found to entangle in time Lookup error is all very important the stability and safety of server system, once there is PCIe Uncorrectable Errors occurs, and such mistake, which may result in PCIe link (PCIe link) and PCIe device, becomes unreliable, clothes The operating system installed in business device system is also required to the link of resetting (reset) exception and/or the PCIe device of exception.
After existing server powers on, the operating system among server is in the operation phase, if system has PCIe Not correctable error, the driving of system may reset corresponding link or equipment, but will not remind administrator, Which class mistake will not inform Administrator system specifically has.Once this mistake repeatedly occurs, system reboot/crash may result in Or the problem that loss of data etc. is serious.System manager can not know specific PCIe type of error and be notified and right in advance Abnormal machine is handled, serious to may result in loss of data, server delay machine or is restarted, and may be brought serious Economic loss and loss of data.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of not correctable errors of PCIe in monitoring server system Method and apparatus, the PCIe device and type of error for PCIe not correctable error occur can be monitored.
In order to reach the object of the invention, the embodiment of the invention provides PCIe in a kind of monitoring server system to correct The method of mistake, this method comprises:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error; It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched Wrong step, which can not be corrected, with the presence or absence of PCIe includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To PCIe can not correct wrong step and include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To PCIe can not correct wrong step and include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device After state can not correct wrong step with the presence or absence of PCIe, this method further include:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete The corresponding document of PCIe device.
On the other hand, the embodiment of the invention provides a kind of dresses of the not correctable error of PCIe in monitoring server system It sets, which includes processor and memory;
Memory is for storing computer-readable instruction;
Processor is for executing computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error; It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched Operation with the presence or absence of PCIe not correctable error includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device After state is with the presence or absence of the PCIe not operation of correctable error, processor is also performed the following operations:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete The corresponding document of PCIe device.
The beneficial effect of the embodiment of the present invention is, obtains by searching for when scanning the not correctable error of PCIe device The return value arrived can monitor the PCIe device and type of error for PCIe not correctable error occur, be convenient for system administration Member is according to the type for the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur Information, promptly repair occur mistake PCIe device, avoid PCIe device it is this mistake repeatedly may result in be System restarts/crashes or loss of data etc. is serious problem or even loss of data, server delay machine are restarted and may be brought Serious economic loss and loss of data.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the process of the PCIe not method of correctable error in monitoring server system provided in an embodiment of the present invention Figure;
Fig. 2 is the block diagram of the PCIe not device of correctable error in monitoring server system provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature can mutual any combination.
Step shown in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions It executes.Also, although logical order is shown in flow charts, and it in some cases, can be to be different from herein suitable Sequence executes shown or described step.
PCIe (Peripheral Component Interconnect express) is a kind of high speed serialization computer expansion Open up bus standard, it is intended to substitute the pci bus of old version, present most of mainboards are equipped with multiple PCIe slots, and external equipment can To be inserted into PCIe slot, communicated by PCIe bus with host.It is most of external in the server design of current main-stream Equipment is all that the equipment (referred to herein simply as " PCIe device ") of server system, such as net are connected to by PCIe slot Card, RAID (Redundant Array ofIndependent Disks, raid-array) card and HBA (Host Bus Adapter, host bus adaptor) card etc., in addition there are also some NVM Express (Non-Volatile Memory Express, Nonvolatile memory host controller interface specification) and system under bridging device etc..Under linux system, when When these equipment have not correctable error (Uncorrectable errors), system manager has found to entangle in time Lookup error is all very important the stability and safety of server system, once there is PCIe Uncorrectable Errors occurs, and such mistake, which may result in PCIe link (PCIe link) and PCIe device, becomes unreliable, clothes The operating system installed in business device system is also required to the link of resetting (reset) exception and/or the PCIe device of exception.
After existing server powers on, the operating system among server is in the operation phase, if system has PCIe Not correctable error, the driving of system may reset corresponding link or equipment, but will not remind administrator, Which class mistake will not inform system administrator specifically has.Once it is this mistake repeatedly occur, may result in system reboot/ The serious problem such as crash or loss of data.System manager can not know specific PCIe type of error and be notified and mention It is preceding that abnormal machine is handled, it is serious to may result in loss of data, server delay machine or restart, may bring tight The economic loss and loss of data of weight.
To solve the above-mentioned problems, on the one hand, the embodiment of the invention provides PCIe in a kind of monitoring server system not The method of correctable error, as shown in Figure 1, the method comprising the steps of S101- step S105.
Step S101, in the initial time in scan cycle period, the state of whole PCIe devices to server system The status information for carrying out each PCIe device of traversal acquisition is stored in respectively among corresponding document.
Step S101 originates in the initial time in scan cycle period, here, being using the scan cycle period as loop cycle Reciprocally scanning is connected to the state of the whole PCIe devices for being connected to PCIe bus among server system, thus constantly Whether ground scanning PCIe device there is PCIe not correctable error.And it is created for each PCIe device in server system Corresponding document, status information of the corresponding PCIe device saved in document in scanning.
Herein for realizing the method for the embodiment of the present invention among the (SuSE) Linux OS installed among server, The method that the present invention is described in detail embodiment provides.
In (SuSE) Linux OS, in order to be embodied as the purpose that PCIe device creates corresponding document, program code is such as Under:
lspci–vvv>/tmp/pcie_uncorrect.txt
Lspci order is used to all pci bus equipment or all devices being connected in the bus in display system.To be The status information of the PCIe device for being connected to pci bus in system is saved among pcie_uncorrect.txt document.
Step S103 searches the state of the PCIe device saved in the corresponding document of each PCIe device with the presence or absence of PCIe Not correctable error.
Among the pcie_uncorrect.txt document created among step S101, the spatial value of PCIe device is as follows:
84:00.0RAID bus controller:Adaptec Series 8 12G SAS/PCIe 3(rev 01)
Subsystem:Adaptec Device 0555
Physical Slot:0-9
Control:I/O+Mem+BusMaster+SpecCycle-MemWINV-VGASnoop-ParErr+Stepping- SERR+FastB2B-DisINTx-
Status:Cap+66MHz-UDF-FastB2B-ParErr-DEVSEL=fast>TAbort-<TAbort-< MAbort->SERR-<PERR-INTx-
Latency:0,Cache Line Size:64 bytes
Interrupt:pin A routed to IRQ 64
Region 0:Memory at fbd00000 (64-bit, non-prefetchable) [size=1M]
Region 2:Memory at fbe80000 (64-bit, non-prefetchable) [size=1K]
Region 4:I/O ports at f000 [size=256]
Expansion ROM at fbe00000 [disabled] [size=512K]
Capabilities:[80]Power Management version 3
Flags:PMEClk-DSI-D1-D2-AuxCurrent=0mA PME (D0+, D1+, D2-, D3hot+, D3cold-)
Status:D0 NoSoftRst+PME-Enable-DSel=0 DScale=0 PME-
Capabilities:[90] MSI:Enable-Count=1/32 Maskable+64bit+
Address:0000000000000000 Data:0000
Masking:00000000 Pending:00000000
Capabilities:[b0] MSI-X:Enable-Count=64 Masked-
Vectortable:BAR=0 offset=00002000
PBA:BAR=0 offset=00003000
Capabilities:[c0]Express(v2)Endpoint,MSI 00
DevCap:MaxPayload 512 bytes,PhantFunc 0,Latency L0s<4us,L1<1us
ExtTag+AttnBtn-AttnInd-PwrInd-RBE+FLReset+
DevCtl:Report errors:Correctable-Non-Fatal+Fatal+Unsupported-
RlxdOrd-ExtTag-PhantFunc-AuxPwr-NoSnoop+FLReset-
MaxPayload 256 bytes,MaxReadReq 512 bytes
DevSta:CorrErr+UncorrErr-FatalErr-UnsuppReq+AuxPwr-TransPend-
LnkCap:Port#0,Speed 8GT/s,Width x8,ASPM unknown,Latency L0 unlimited, L1<64us
ClockPM-Surprise-LLActRep-BwNot-
LnkCtl:ASPM Disabled;RCB 64 bytes Disabled-Retrain-CommClk+
ExtSynch-ClockPM-AutWidDis-BWInt-AutBWInt-
LnkSta:Speed 8GT/s,Width x8,TrErr-Train-SlotClk+DLActive-BWMgmt- ABWMgmt-
DevCap2:Completion Timeout:Range B,TimeoutDis+,LTR-,OBFF Via message
DevCtl2:Completion Timeout:50us to 50ms,TimeoutDis-,LTR-,OBFF Disabled
LnkCtl2:Target Link Speed:8GT/s,EnterCompliance-SpeedDis-
Transmit Margin:Normal Operating Range,EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis:-6dB
LnkSta2:Current De-emphasis Level:-6dB,EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+,EqualizationPhase3+,LinkEqualizationRequest-
Capabilities:[100 v2]Advanced Error Reporting
UESta:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt-UnxCmplt-RxOF-MalfTLP-ECRC- UnsupReq-ACSViol-
UEMsk:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt-UnxCmplt-RxOF-MalfTLP-ECRC- UnsupReq-ACSViol-
UESvrt:DLP+SDES+TLP-FCP+CmpltTO-CmpltAbrt-UnxCmplt-RxOF+MalfTLP+ECRC- UnsupReq-ACSViol-
CESta:RxErr-BadTLP-BadDLLP-Rollover-Timeout-NonFatalErr-
CEMsk:RxErr-BadTLP-BadDLLP-Rollover-Timeout-NonFatalErr+
AERCap:First Error Pointer:00,GenCap+CGenEn-ChkCap+ChkEn-
Capabilities:[300v1]#19
Kernel driver in use:aacraid
Kernel modules:aacraid
cat/tmp/pcie_uncorrect.txt|grep-i"UESta"|grep-i-E"DLP\+|SDES\+|TLP\+| FCP\+|CmpltTO\+|CmpltAbrt\+|UnxCmplt\+|RxOF\+|MalfTLP\+|ECRC\+|UnsupReq\+| ACSViol\+"
Searching among above-mentioned code indicates whether PCIe device the information of PCIe not correctable error occurs.
Step S105 sends mail if finding PCIe not correctable error to prompt the PCIe found can not Correct mistake;It wherein, include PCIe occur the attribute information of the equipment of correctable error and the PCIe found not being in mail The type information of correctable error.
Above procedure code is connected, step S105 can be realized by executing following code.
If finding the PCIe device in server system according to the above method and PCIe not correctable error occur, then To the system manager of server system send mail, the mail to system manager prompt there is PCIe device, need be System administrator checks and repairs.And comprise the following information that among the mail PCIe occur correctable error is not set (attribute information refers to device number, port address, IP address of equipment etc. convenient for searching and confirming equipment identities to standby attribute information Information), and the type information of the PCIe that finds not correctable error.System manager can according to there is PCIe can not The type information of the attribute information for correcting the equipment of mistake and the PCIe that finds not correctable error, promptly repairs and occurs The PCIe device of mistake, the system reboot/crash or data for avoiding this mistake of PCIe device from repeatedly may result in are lost Serious economic loss and data are restarted and may be brought to the serious problem or even loss of data such as mistake, server delay machine It loses.
In an alternative embodiment, step S103 includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value To determine whether there is PCIe not correctable error.
Code segment more than connecting, wherein code line UESta:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt- UnxCmplt-RxOF-MalfTLP-ECRC-UnsupReq-ACSViol- represents the state of PCIe device, by searching for this Return value among line code may determine whether PCIe unrepairable mistake.Return value mentioned here refers to PCIe not The type code of correctable error is subsequent-or+, the meaning of return value representative can be set, for example,-represent at this Among PCIe device corresponding to document there is no this-before code representated by the PCIe of type can not correct mistake Accidentally;Meanwhile+represent among the PCIe device corresponding to this document there is this+before code representated by The PCIe of type not correctable error.Thus, it is possible to find out with the presence or absence of PCIe not correctable error.
In an alternative embodiment, step S105 includes:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, step S105 includes:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come Determine the equipment of PCIe not correctable error occur.
Since each PCIe device corresponds to a txt document, the document recites whether the PCIe device PCIe occurs Not correctable error, thus find there is PCIe mistake when, can be determined according to the index of document there is PCIe can not The equipment for correcting mistake.
Based in the method for above-described embodiment prompt there is PCIe correctable error, prompt PCIe do not occur and can not correct The type of the attribute information of the equipment of mistake and/or prompt PCIe not correctable error, can be convenient system manager and disposes out The PCIe device (for example, resetting, maintenance or debugging PCIe device) of existing above-mentioned mistake, avoids PCIe device from occurring to entangle again and again Lookup error and break down, it is unstable so influence whole system.It has been disposed in system manager and PCIe has occurred and can not correct mistake After equipment accidentally, when reaching next PCIe scan cycle period, PCIe not removed by correctable error, will not The PCIe disposed not correctable error is scanned again.
In an alternative embodiment, after step s 103, method further include:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete The corresponding document of PCIe device.
The embodiment of the present invention is to traverse the corresponding text of PCIe device under each index by the way of scan round Shelves, if not finding this among the corresponding document of some PCIe device among a scan cycle period There is PCIe unrepairable mistake in PCIe device, then it is corresponding to delete the PCIe device that the initial time in the scan cycle period creates Document.At the beginning of next scan cycle period, it will again with the step S101 phase with the embodiment of the present invention Same mode scans PCIe device and creates the corresponding document of PCIe device, and details are not described herein.
On the other hand, the embodiment of the invention provides a kind of dresses of the not correctable error of PCIe in monitoring server system It sets, as shown in Fig. 2, the device includes memory 10 and processor 20.
Memory 10 is for storing computer-readable instruction;
Processor 20 is for executing computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error; It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched Operation with the presence or absence of PCIe not correctable error includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device After state is with the presence or absence of the PCIe not operation of correctable error, processor 20 is also performed the following operations:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete The corresponding document of PCIe device.
Although disclosed herein embodiment it is as above, above-mentioned content only for ease of understanding the present invention and use Embodiment is not intended to limit the invention.Technical staff in any fields of the present invention is taken off not departing from the present invention Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (10)

1. a kind of method of PCIe not correctable error in monitoring server system characterized by comprising
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal acquisition The status information of each PCIe device is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;Wherein, It include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail Type information.
2. described to search the PCIe saved in the corresponding document of each PCIe device according to the method described in claim 1, wherein The state of equipment can not correct wrong step with the presence or absence of PCIe
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to the return value To determine whether there is PCIe not correctable error.
3. according to the method described in claim 2, wherein, if described find PCIe not correctable error, sending mail Include: to prompt the PCIe found that can not correct wrong step
If finding PCIe not correctable error according to the return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of the code not correctable error.
4. according to the method described in claim 2, wherein, if described find PCIe not correctable error, sending mail Include: to prompt the PCIe found that can not correct wrong step
If PCIe not correctable error is found according to the return value, according to the rope of the corresponding document of the PCIe device Attract the equipment for determining and PCIe not correctable error occur.
5. the method according to claim 1, wherein being protected in described search in the corresponding document of each PCIe device After the state for the PCIe device deposited can not correct wrong step with the presence or absence of PCIe, the method also includes:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, PCIe is deleted The corresponding document of equipment.
6. the device of PCIe not correctable error in a kind of monitoring server system, which is characterized in that including processor and storage Device;
The memory is for storing computer-readable instruction;
The processor is for executing the computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal acquisition The status information of each PCIe device is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;Wherein, It include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail Type information.
7. device according to claim 6, wherein described to search the PCIe saved in the corresponding document of each PCIe device The state of equipment includes: with the presence or absence of the operation of PCIe not correctable error
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to the return value To determine whether there is PCIe not correctable error.
8. device according to claim 7, wherein if described find PCIe not correctable error, send mail Operation to prompt the PCIe that finds not correctable error includes:
If finding PCIe not correctable error according to the return value, according to the return value for representing PCIe not correctable error Corresponding code searches the type of the corresponding PCIe of the code not correctable error.
9. device according to claim 7, wherein if described find PCIe not correctable error, send mail Operation to prompt the PCIe that finds not correctable error includes:
If PCIe not correctable error is found according to the return value, according to the rope of the corresponding document of the PCIe device Attract the equipment for determining and PCIe not correctable error occur.
10. device according to claim 6, which is characterized in that searched in the corresponding document of each PCIe device described After the state of the PCIe device of preservation is with the presence or absence of the PCIe not operation of correctable error, the processor also executes following behaviour Make:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, PCIe is deleted The corresponding document of equipment.
CN201910685722.8A 2019-07-28 2019-07-28 The method and apparatus of PCIe not correctable error in monitoring server system Withdrawn CN110532120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910685722.8A CN110532120A (en) 2019-07-28 2019-07-28 The method and apparatus of PCIe not correctable error in monitoring server system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910685722.8A CN110532120A (en) 2019-07-28 2019-07-28 The method and apparatus of PCIe not correctable error in monitoring server system

Publications (1)

Publication Number Publication Date
CN110532120A true CN110532120A (en) 2019-12-03

Family

ID=68660958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910685722.8A Withdrawn CN110532120A (en) 2019-07-28 2019-07-28 The method and apparatus of PCIe not correctable error in monitoring server system

Country Status (1)

Country Link
CN (1) CN110532120A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767242A (en) * 2020-05-28 2020-10-13 西安广和通无线软件有限公司 PCIE equipment control method and device, computer equipment and storage medium
CN112256539A (en) * 2020-09-18 2021-01-22 苏州浪潮智能科技有限公司 PCIE link error statistical method, device, terminal and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767242A (en) * 2020-05-28 2020-10-13 西安广和通无线软件有限公司 PCIE equipment control method and device, computer equipment and storage medium
CN111767242B (en) * 2020-05-28 2022-04-15 西安广和通无线软件有限公司 PCIE equipment control method and device, computer equipment and storage medium
CN112256539A (en) * 2020-09-18 2021-01-22 苏州浪潮智能科技有限公司 PCIE link error statistical method, device, terminal and storage medium
CN112256539B (en) * 2020-09-18 2022-07-19 苏州浪潮智能科技有限公司 PCIE link error statistical method, device, terminal and storage medium

Similar Documents

Publication Publication Date Title
JP6333410B2 (en) Fault processing method, related apparatus, and computer
US8069371B2 (en) Method and system for remotely debugging a hung or crashed computing system
US8424016B2 (en) Techniques to manage critical region interrupts
US11604711B2 (en) Error recovery method and apparatus
US20100162045A1 (en) Method, apparatus and system for restarting an emulated mainframe iop
US11175977B2 (en) Method and system to detect failure in PCIe endpoint devices
CN110532120A (en) The method and apparatus of PCIe not correctable error in monitoring server system
US20240048468A1 (en) Traffic monitoring method and apparatus for open stack tenant network
US11442831B2 (en) Method, apparatus, device and system for capturing trace of NVME hard disc
CN117453442A (en) Recording method, device, equipment and storage medium for server error reporting information
CN109885420B (en) PCIe link fault analysis method, BMC and storage medium
CN116932274B (en) Heterogeneous computing system and server system
CN114281639A (en) Storage server fault SAS physical link shielding device and method
Cisco ENVM through FR_FRAG
CN107451035B (en) Error state data providing method for computer device
CN115629825B (en) Server and asset information acquisition method, asset information providing method and asset information providing device
TWI715005B (en) Monitor method for demand of a bmc
US20230236917A1 (en) Attributing errors to input/output peripheral drivers
TWI602054B (en) Method of providing error status data for computer device
CN112084049A (en) Method for monitoring resident program of baseboard management controller
CN117931536A (en) Fault processing method, device, electronic equipment and medium
CN117687821A (en) Method and device for processing bad blocks of cache memory and electronic equipment
CN117472622A (en) Method, device, equipment and storage medium for isolating fault memory
CN115495291A (en) Method and apparatus for facilitating recording of system fatal errors
Headquarters Integrated Services Router (ISR) G2 Manageability Document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191203

WW01 Invention patent application withdrawn after publication