CN110532120A - The method and apparatus of PCIe not correctable error in monitoring server system - Google Patents
The method and apparatus of PCIe not correctable error in monitoring server system Download PDFInfo
- Publication number
- CN110532120A CN110532120A CN201910685722.8A CN201910685722A CN110532120A CN 110532120 A CN110532120 A CN 110532120A CN 201910685722 A CN201910685722 A CN 201910685722A CN 110532120 A CN110532120 A CN 110532120A
- Authority
- CN
- China
- Prior art keywords
- pcie
- correctable error
- corresponding document
- return value
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Abstract
The invention discloses a kind of methods of the not correctable error of PCIe in monitoring server system.In the initial time in scan cycle period, the status information that the state of whole PCIe devices to server system carries out each PCIe device of traversal acquisition is stored in respectively among corresponding document;The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;It wherein, include the type information for the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail.The above method can monitor the PCIe device and type of error for PCIe not correctable error occur.Also disclose the device of the not correctable error of PCIe in corresponding monitoring server system.
Description
Technical field
The present invention relates to server monitoring technology, the side of PCIe not correctable error in espespecially a kind of monitoring server system
Method and device.
Background technique
PCIe (Peripheral Component Interconnect express) is a kind of high speed serialization computer expansion
Open up bus standard, it is intended to substitute the pci bus of old version, present most of mainboards are equipped with multiple PCIe slots, and external equipment can
To be inserted into PCIe slot, communicated by PCIe bus with host.It is most of external in the server design of current main-stream
Equipment is all that the equipment (referred to herein simply as " PCIe device ") of server system, such as net are connected to by PCIe slot
Card, RAID (Redundant Array ofIndependent Disks, raid-array) card and HBA (Host Bus
Adapter, host bus adaptor) card etc., in addition there are also some NVM Express (Non-Volatile Memory
Express, Nonvolatile memory host controller interface specification) and system under bridging device etc..Under linux system, when
When these equipment have not correctable error (Uncorrectable errors), system manager has found to entangle in time
Lookup error is all very important the stability and safety of server system, once there is PCIe Uncorrectable
Errors occurs, and such mistake, which may result in PCIe link (PCIe link) and PCIe device, becomes unreliable, clothes
The operating system installed in business device system is also required to the link of resetting (reset) exception and/or the PCIe device of exception.
After existing server powers on, the operating system among server is in the operation phase, if system has PCIe
Not correctable error, the driving of system may reset corresponding link or equipment, but will not remind administrator,
Which class mistake will not inform Administrator system specifically has.Once this mistake repeatedly occurs, system reboot/crash may result in
Or the problem that loss of data etc. is serious.System manager can not know specific PCIe type of error and be notified and right in advance
Abnormal machine is handled, serious to may result in loss of data, server delay machine or is restarted, and may be brought serious
Economic loss and loss of data.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of not correctable errors of PCIe in monitoring server system
Method and apparatus, the PCIe device and type of error for PCIe not correctable error occur can be monitored.
In order to reach the object of the invention, the embodiment of the invention provides PCIe in a kind of monitoring server system to correct
The method of mistake, this method comprises:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained
The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe
Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;
It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail
Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched
Wrong step, which can not be corrected, with the presence or absence of PCIe includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value
To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To PCIe can not correct wrong step and include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To PCIe can not correct wrong step and include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come
Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device
After state can not correct wrong step with the presence or absence of PCIe, this method further include:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete
The corresponding document of PCIe device.
On the other hand, the embodiment of the invention provides a kind of dresses of the not correctable error of PCIe in monitoring server system
It sets, which includes processor and memory;
Memory is for storing computer-readable instruction;
Processor is for executing computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained
The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe
Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;
It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail
Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched
Operation with the presence or absence of PCIe not correctable error includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value
To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come
Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device
After state is with the presence or absence of the PCIe not operation of correctable error, processor is also performed the following operations:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete
The corresponding document of PCIe device.
The beneficial effect of the embodiment of the present invention is, obtains by searching for when scanning the not correctable error of PCIe device
The return value arrived can monitor the PCIe device and type of error for PCIe not correctable error occur, be convenient for system administration
Member is according to the type for the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur
Information, promptly repair occur mistake PCIe device, avoid PCIe device it is this mistake repeatedly may result in be
System restarts/crashes or loss of data etc. is serious problem or even loss of data, server delay machine are restarted and may be brought
Serious economic loss and loss of data.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right
Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this
The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the process of the PCIe not method of correctable error in monitoring server system provided in an embodiment of the present invention
Figure;
Fig. 2 is the block diagram of the PCIe not device of correctable error in monitoring server system provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention
Embodiment be described in detail.It should be noted that in the absence of conflict, in the embodiment and embodiment in the application
Feature can mutual any combination.
Step shown in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions
It executes.Also, although logical order is shown in flow charts, and it in some cases, can be to be different from herein suitable
Sequence executes shown or described step.
PCIe (Peripheral Component Interconnect express) is a kind of high speed serialization computer expansion
Open up bus standard, it is intended to substitute the pci bus of old version, present most of mainboards are equipped with multiple PCIe slots, and external equipment can
To be inserted into PCIe slot, communicated by PCIe bus with host.It is most of external in the server design of current main-stream
Equipment is all that the equipment (referred to herein simply as " PCIe device ") of server system, such as net are connected to by PCIe slot
Card, RAID (Redundant Array ofIndependent Disks, raid-array) card and HBA (Host Bus
Adapter, host bus adaptor) card etc., in addition there are also some NVM Express (Non-Volatile Memory
Express, Nonvolatile memory host controller interface specification) and system under bridging device etc..Under linux system, when
When these equipment have not correctable error (Uncorrectable errors), system manager has found to entangle in time
Lookup error is all very important the stability and safety of server system, once there is PCIe Uncorrectable
Errors occurs, and such mistake, which may result in PCIe link (PCIe link) and PCIe device, becomes unreliable, clothes
The operating system installed in business device system is also required to the link of resetting (reset) exception and/or the PCIe device of exception.
After existing server powers on, the operating system among server is in the operation phase, if system has PCIe
Not correctable error, the driving of system may reset corresponding link or equipment, but will not remind administrator,
Which class mistake will not inform system administrator specifically has.Once it is this mistake repeatedly occur, may result in system reboot/
The serious problem such as crash or loss of data.System manager can not know specific PCIe type of error and be notified and mention
It is preceding that abnormal machine is handled, it is serious to may result in loss of data, server delay machine or restart, may bring tight
The economic loss and loss of data of weight.
To solve the above-mentioned problems, on the one hand, the embodiment of the invention provides PCIe in a kind of monitoring server system not
The method of correctable error, as shown in Figure 1, the method comprising the steps of S101- step S105.
Step S101, in the initial time in scan cycle period, the state of whole PCIe devices to server system
The status information for carrying out each PCIe device of traversal acquisition is stored in respectively among corresponding document.
Step S101 originates in the initial time in scan cycle period, here, being using the scan cycle period as loop cycle
Reciprocally scanning is connected to the state of the whole PCIe devices for being connected to PCIe bus among server system, thus constantly
Whether ground scanning PCIe device there is PCIe not correctable error.And it is created for each PCIe device in server system
Corresponding document, status information of the corresponding PCIe device saved in document in scanning.
Herein for realizing the method for the embodiment of the present invention among the (SuSE) Linux OS installed among server,
The method that the present invention is described in detail embodiment provides.
In (SuSE) Linux OS, in order to be embodied as the purpose that PCIe device creates corresponding document, program code is such as
Under:
lspci–vvv>/tmp/pcie_uncorrect.txt
Lspci order is used to all pci bus equipment or all devices being connected in the bus in display system.To be
The status information of the PCIe device for being connected to pci bus in system is saved among pcie_uncorrect.txt document.
Step S103 searches the state of the PCIe device saved in the corresponding document of each PCIe device with the presence or absence of PCIe
Not correctable error.
Among the pcie_uncorrect.txt document created among step S101, the spatial value of PCIe device is as follows:
84:00.0RAID bus controller:Adaptec Series 8 12G SAS/PCIe 3(rev 01)
Subsystem:Adaptec Device 0555
Physical Slot:0-9
Control:I/O+Mem+BusMaster+SpecCycle-MemWINV-VGASnoop-ParErr+Stepping-
SERR+FastB2B-DisINTx-
Status:Cap+66MHz-UDF-FastB2B-ParErr-DEVSEL=fast>TAbort-<TAbort-<
MAbort->SERR-<PERR-INTx-
Latency:0,Cache Line Size:64 bytes
Interrupt:pin A routed to IRQ 64
Region 0:Memory at fbd00000 (64-bit, non-prefetchable) [size=1M]
Region 2:Memory at fbe80000 (64-bit, non-prefetchable) [size=1K]
Region 4:I/O ports at f000 [size=256]
Expansion ROM at fbe00000 [disabled] [size=512K]
Capabilities:[80]Power Management version 3
Flags:PMEClk-DSI-D1-D2-AuxCurrent=0mA PME (D0+, D1+, D2-, D3hot+,
D3cold-)
Status:D0 NoSoftRst+PME-Enable-DSel=0 DScale=0 PME-
Capabilities:[90] MSI:Enable-Count=1/32 Maskable+64bit+
Address:0000000000000000 Data:0000
Masking:00000000 Pending:00000000
Capabilities:[b0] MSI-X:Enable-Count=64 Masked-
Vectortable:BAR=0 offset=00002000
PBA:BAR=0 offset=00003000
Capabilities:[c0]Express(v2)Endpoint,MSI 00
DevCap:MaxPayload 512 bytes,PhantFunc 0,Latency L0s<4us,L1<1us
ExtTag+AttnBtn-AttnInd-PwrInd-RBE+FLReset+
DevCtl:Report errors:Correctable-Non-Fatal+Fatal+Unsupported-
RlxdOrd-ExtTag-PhantFunc-AuxPwr-NoSnoop+FLReset-
MaxPayload 256 bytes,MaxReadReq 512 bytes
DevSta:CorrErr+UncorrErr-FatalErr-UnsuppReq+AuxPwr-TransPend-
LnkCap:Port#0,Speed 8GT/s,Width x8,ASPM unknown,Latency L0 unlimited,
L1<64us
ClockPM-Surprise-LLActRep-BwNot-
LnkCtl:ASPM Disabled;RCB 64 bytes Disabled-Retrain-CommClk+
ExtSynch-ClockPM-AutWidDis-BWInt-AutBWInt-
LnkSta:Speed 8GT/s,Width x8,TrErr-Train-SlotClk+DLActive-BWMgmt-
ABWMgmt-
DevCap2:Completion Timeout:Range B,TimeoutDis+,LTR-,OBFF Via message
DevCtl2:Completion Timeout:50us to 50ms,TimeoutDis-,LTR-,OBFF
Disabled
LnkCtl2:Target Link Speed:8GT/s,EnterCompliance-SpeedDis-
Transmit Margin:Normal Operating Range,EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis:-6dB
LnkSta2:Current De-emphasis Level:-6dB,EqualizationComplete+,
EqualizationPhase1+
EqualizationPhase2+,EqualizationPhase3+,LinkEqualizationRequest-
Capabilities:[100 v2]Advanced Error Reporting
UESta:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt-UnxCmplt-RxOF-MalfTLP-ECRC-
UnsupReq-ACSViol-
UEMsk:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt-UnxCmplt-RxOF-MalfTLP-ECRC-
UnsupReq-ACSViol-
UESvrt:DLP+SDES+TLP-FCP+CmpltTO-CmpltAbrt-UnxCmplt-RxOF+MalfTLP+ECRC-
UnsupReq-ACSViol-
CESta:RxErr-BadTLP-BadDLLP-Rollover-Timeout-NonFatalErr-
CEMsk:RxErr-BadTLP-BadDLLP-Rollover-Timeout-NonFatalErr+
AERCap:First Error Pointer:00,GenCap+CGenEn-ChkCap+ChkEn-
Capabilities:[300v1]#19
Kernel driver in use:aacraid
Kernel modules:aacraid
cat/tmp/pcie_uncorrect.txt|grep-i"UESta"|grep-i-E"DLP\+|SDES\+|TLP\+|
FCP\+|CmpltTO\+|CmpltAbrt\+|UnxCmplt\+|RxOF\+|MalfTLP\+|ECRC\+|UnsupReq\+|
ACSViol\+"
Searching among above-mentioned code indicates whether PCIe device the information of PCIe not correctable error occurs.
Step S105 sends mail if finding PCIe not correctable error to prompt the PCIe found can not
Correct mistake;It wherein, include PCIe occur the attribute information of the equipment of correctable error and the PCIe found not being in mail
The type information of correctable error.
Above procedure code is connected, step S105 can be realized by executing following code.
If finding the PCIe device in server system according to the above method and PCIe not correctable error occur, then
To the system manager of server system send mail, the mail to system manager prompt there is PCIe device, need be
System administrator checks and repairs.And comprise the following information that among the mail PCIe occur correctable error is not set
(attribute information refers to device number, port address, IP address of equipment etc. convenient for searching and confirming equipment identities to standby attribute information
Information), and the type information of the PCIe that finds not correctable error.System manager can according to there is PCIe can not
The type information of the attribute information for correcting the equipment of mistake and the PCIe that finds not correctable error, promptly repairs and occurs
The PCIe device of mistake, the system reboot/crash or data for avoiding this mistake of PCIe device from repeatedly may result in are lost
Serious economic loss and data are restarted and may be brought to the serious problem or even loss of data such as mistake, server delay machine
It loses.
In an alternative embodiment, step S103 includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value
To determine whether there is PCIe not correctable error.
Code segment more than connecting, wherein code line UESta:DLP-SDES-TLP-FCP-CmpltTO-CmpltAbrt-
UnxCmplt-RxOF-MalfTLP-ECRC-UnsupReq-ACSViol- represents the state of PCIe device, by searching for this
Return value among line code may determine whether PCIe unrepairable mistake.Return value mentioned here refers to PCIe not
The type code of correctable error is subsequent-or+, the meaning of return value representative can be set, for example,-represent at this
Among PCIe device corresponding to document there is no this-before code representated by the PCIe of type can not correct mistake
Accidentally;Meanwhile+represent among the PCIe device corresponding to this document there is this+before code representated by
The PCIe of type not correctable error.Thus, it is possible to find out with the presence or absence of PCIe not correctable error.
In an alternative embodiment, step S105 includes:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, step S105 includes:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come
Determine the equipment of PCIe not correctable error occur.
Since each PCIe device corresponds to a txt document, the document recites whether the PCIe device PCIe occurs
Not correctable error, thus find there is PCIe mistake when, can be determined according to the index of document there is PCIe can not
The equipment for correcting mistake.
Based in the method for above-described embodiment prompt there is PCIe correctable error, prompt PCIe do not occur and can not correct
The type of the attribute information of the equipment of mistake and/or prompt PCIe not correctable error, can be convenient system manager and disposes out
The PCIe device (for example, resetting, maintenance or debugging PCIe device) of existing above-mentioned mistake, avoids PCIe device from occurring to entangle again and again
Lookup error and break down, it is unstable so influence whole system.It has been disposed in system manager and PCIe has occurred and can not correct mistake
After equipment accidentally, when reaching next PCIe scan cycle period, PCIe not removed by correctable error, will not
The PCIe disposed not correctable error is scanned again.
In an alternative embodiment, after step s 103, method further include:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete
The corresponding document of PCIe device.
The embodiment of the present invention is to traverse the corresponding text of PCIe device under each index by the way of scan round
Shelves, if not finding this among the corresponding document of some PCIe device among a scan cycle period
There is PCIe unrepairable mistake in PCIe device, then it is corresponding to delete the PCIe device that the initial time in the scan cycle period creates
Document.At the beginning of next scan cycle period, it will again with the step S101 phase with the embodiment of the present invention
Same mode scans PCIe device and creates the corresponding document of PCIe device, and details are not described herein.
On the other hand, the embodiment of the invention provides a kind of dresses of the not correctable error of PCIe in monitoring server system
It sets, as shown in Fig. 2, the device includes memory 10 and processor 20.
Memory 10 is for storing computer-readable instruction;
Processor 20 is for executing computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal and is obtained
The status information of each PCIe device obtained is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device can not be corrected with the presence or absence of PCIe
Mistake;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;
It wherein, include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in mail
Type information.
In an alternative embodiment, the state of the PCIe device saved in the corresponding document of each PCIe device is searched
Operation with the presence or absence of PCIe not correctable error includes:
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to return value
To determine whether there is PCIe not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of code not correctable error.
In an alternative embodiment, if finding PCIe not correctable error, mail is sent to prompt to search
To the operation of PCIe not correctable error include:
If finding PCIe not correctable error according to return value, according to the index of the corresponding document of PCIe device come
Determine the equipment of PCIe not correctable error occur.
In an alternative embodiment, the shape of the PCIe device saved in searching the corresponding document of each PCIe device
After state is with the presence or absence of the PCIe not operation of correctable error, processor 20 is also performed the following operations:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, delete
The corresponding document of PCIe device.
Although disclosed herein embodiment it is as above, above-mentioned content only for ease of understanding the present invention and use
Embodiment is not intended to limit the invention.Technical staff in any fields of the present invention is taken off not departing from the present invention
Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation
Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.
Claims (10)
1. a kind of method of PCIe not correctable error in monitoring server system characterized by comprising
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal acquisition
The status information of each PCIe device is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;Wherein,
It include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail
Type information.
2. described to search the PCIe saved in the corresponding document of each PCIe device according to the method described in claim 1, wherein
The state of equipment can not correct wrong step with the presence or absence of PCIe
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to the return value
To determine whether there is PCIe not correctable error.
3. according to the method described in claim 2, wherein, if described find PCIe not correctable error, sending mail
Include: to prompt the PCIe found that can not correct wrong step
If finding PCIe not correctable error according to the return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of the code not correctable error.
4. according to the method described in claim 2, wherein, if described find PCIe not correctable error, sending mail
Include: to prompt the PCIe found that can not correct wrong step
If PCIe not correctable error is found according to the return value, according to the rope of the corresponding document of the PCIe device
Attract the equipment for determining and PCIe not correctable error occur.
5. the method according to claim 1, wherein being protected in described search in the corresponding document of each PCIe device
After the state for the PCIe device deposited can not correct wrong step with the presence or absence of PCIe, the method also includes:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, PCIe is deleted
The corresponding document of equipment.
6. the device of PCIe not correctable error in a kind of monitoring server system, which is characterized in that including processor and storage
Device;
The memory is for storing computer-readable instruction;
The processor is for executing the computer-readable instruction, to perform the following operations:
In the initial time in scan cycle period, the state of whole PCIe devices to server system is carried out traversal acquisition
The status information of each PCIe device is stored in respectively among corresponding document;
The state for searching the PCIe device saved in the corresponding document of each PCIe device whether there is PCIe not correctable error;
If finding PCIe not correctable error, mail is sent to prompt the PCIe found not correctable error;Wherein,
It include the PCIe not attribute information of the equipment of correctable error and the PCIe found not correctable error occur in the mail
Type information.
7. device according to claim 6, wherein described to search the PCIe saved in the corresponding document of each PCIe device
The state of equipment includes: with the presence or absence of the operation of PCIe not correctable error
The return value that PCIe device state is represented among the corresponding document of each PCIe device is searched, according to the return value
To determine whether there is PCIe not correctable error.
8. device according to claim 7, wherein if described find PCIe not correctable error, send mail
Operation to prompt the PCIe that finds not correctable error includes:
If finding PCIe not correctable error according to the return value, according to the return value for representing PCIe not correctable error
Corresponding code searches the type of the corresponding PCIe of the code not correctable error.
9. device according to claim 7, wherein if described find PCIe not correctable error, send mail
Operation to prompt the PCIe that finds not correctable error includes:
If PCIe not correctable error is found according to the return value, according to the rope of the corresponding document of the PCIe device
Attract the equipment for determining and PCIe not correctable error occur.
10. device according to claim 6, which is characterized in that searched in the corresponding document of each PCIe device described
After the state of the PCIe device of preservation is with the presence or absence of the PCIe not operation of correctable error, the processor also executes following behaviour
Make:
If not finding PCIe not correctable error, when reaching the end time in scan cycle period, PCIe is deleted
The corresponding document of equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910685722.8A CN110532120A (en) | 2019-07-28 | 2019-07-28 | The method and apparatus of PCIe not correctable error in monitoring server system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910685722.8A CN110532120A (en) | 2019-07-28 | 2019-07-28 | The method and apparatus of PCIe not correctable error in monitoring server system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110532120A true CN110532120A (en) | 2019-12-03 |
Family
ID=68660958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910685722.8A Withdrawn CN110532120A (en) | 2019-07-28 | 2019-07-28 | The method and apparatus of PCIe not correctable error in monitoring server system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532120A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767242A (en) * | 2020-05-28 | 2020-10-13 | 西安广和通无线软件有限公司 | PCIE equipment control method and device, computer equipment and storage medium |
CN112256539A (en) * | 2020-09-18 | 2021-01-22 | 苏州浪潮智能科技有限公司 | PCIE link error statistical method, device, terminal and storage medium |
-
2019
- 2019-07-28 CN CN201910685722.8A patent/CN110532120A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767242A (en) * | 2020-05-28 | 2020-10-13 | 西安广和通无线软件有限公司 | PCIE equipment control method and device, computer equipment and storage medium |
CN111767242B (en) * | 2020-05-28 | 2022-04-15 | 西安广和通无线软件有限公司 | PCIE equipment control method and device, computer equipment and storage medium |
CN112256539A (en) * | 2020-09-18 | 2021-01-22 | 苏州浪潮智能科技有限公司 | PCIE link error statistical method, device, terminal and storage medium |
CN112256539B (en) * | 2020-09-18 | 2022-07-19 | 苏州浪潮智能科技有限公司 | PCIE link error statistical method, device, terminal and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6333410B2 (en) | Fault processing method, related apparatus, and computer | |
US8069371B2 (en) | Method and system for remotely debugging a hung or crashed computing system | |
US8424016B2 (en) | Techniques to manage critical region interrupts | |
US11604711B2 (en) | Error recovery method and apparatus | |
US20100162045A1 (en) | Method, apparatus and system for restarting an emulated mainframe iop | |
US11175977B2 (en) | Method and system to detect failure in PCIe endpoint devices | |
CN110532120A (en) | The method and apparatus of PCIe not correctable error in monitoring server system | |
US20240048468A1 (en) | Traffic monitoring method and apparatus for open stack tenant network | |
US11442831B2 (en) | Method, apparatus, device and system for capturing trace of NVME hard disc | |
CN117453442A (en) | Recording method, device, equipment and storage medium for server error reporting information | |
CN109885420B (en) | PCIe link fault analysis method, BMC and storage medium | |
CN116932274B (en) | Heterogeneous computing system and server system | |
CN114281639A (en) | Storage server fault SAS physical link shielding device and method | |
Cisco | ENVM through FR_FRAG | |
CN107451035B (en) | Error state data providing method for computer device | |
CN115629825B (en) | Server and asset information acquisition method, asset information providing method and asset information providing device | |
TWI715005B (en) | Monitor method for demand of a bmc | |
US20230236917A1 (en) | Attributing errors to input/output peripheral drivers | |
TWI602054B (en) | Method of providing error status data for computer device | |
CN112084049A (en) | Method for monitoring resident program of baseboard management controller | |
CN117931536A (en) | Fault processing method, device, electronic equipment and medium | |
CN117687821A (en) | Method and device for processing bad blocks of cache memory and electronic equipment | |
CN117472622A (en) | Method, device, equipment and storage medium for isolating fault memory | |
CN115495291A (en) | Method and apparatus for facilitating recording of system fatal errors | |
Headquarters | Integrated Services Router (ISR) G2 Manageability Document |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191203 |
|
WW01 | Invention patent application withdrawn after publication |