CN109947596A - PCIE device failure system delay machine processing method, device and associated component - Google Patents

PCIE device failure system delay machine processing method, device and associated component Download PDF

Info

Publication number
CN109947596A
CN109947596A CN201910209284.8A CN201910209284A CN109947596A CN 109947596 A CN109947596 A CN 109947596A CN 201910209284 A CN201910209284 A CN 201910209284A CN 109947596 A CN109947596 A CN 109947596A
Authority
CN
China
Prior art keywords
failure
pcie device
delay machine
data
machine processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910209284.8A
Other languages
Chinese (zh)
Inventor
刘冰
班华堂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Power Commercial Systems Co Ltd
Original Assignee
Inspur Power Commercial Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Power Commercial Systems Co Ltd filed Critical Inspur Power Commercial Systems Co Ltd
Priority to CN201910209284.8A priority Critical patent/CN109947596A/en
Publication of CN109947596A publication Critical patent/CN109947596A/en
Pending legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)

Abstract

This application discloses a kind of PCIE device failure system delay machine processing methods, it is related to electronic technology field, when detecting server system delay machine, by the reserved fault log memory space of the internal register data write-in for carrying fault log information, which can be used for subsequent determining failure PCIE device;When register data, which is written, to be completed, system reboot is triggered;Failure PCIE device is determined after system reboot triggering, and down state is set by faulty equipment, to automatism isolation faulty equipment, in addition, the faulty equipment for being set to down state can be replaced update in any convenient time, the broken string of business caused by needing artificial apparatus to repair in system failure is avoided, customer service can be continued to execute according to available PCI E equipment after the completion of system reboot;Disclosed herein as well is a kind of PCIE device failure system delay machine processing unit, equipment and a kind of computer readable storage mediums, have above-mentioned beneficial effect.

Description

PCIE device failure system delay machine processing method, device and associated component
Technical field
This application involves electronic technology field, in particular to a kind of PCIE device failure system delay machine processing method, device, Equipment and a kind of computer readable storage medium.
Background technique
Server is the core of whole network system and computing platform, with the quick hair of cloud computing and big data technology Exhibition, the data center of construction is also more and more, the exponentially other growth of the quantity of server system, the clothes of server system Business quality is the server system attribute that user is concerned about the most.
The availability of server system decides the service quality of server system, and PCIE belongs to the master on server system Component is wanted, when the failure of unrepairable occurs in PCIE device, mistake occurs due to being likely to result in processor, or cause to grasp Make system failure, so generally causing entire server system delay machine.
After the generation of system failure caused by unrepairable mistake occurs for certain PCIE device, conventional method is fixed after finding delay machine Position carries out replacement operation behind problem PCIE device position, powers on during this, services again with regard to the positioning that needs to take time, system cut-off Device administrative staff carry out a series of actions, the times for causing server system offline such as PCIE device replacement can be long.Service Device system is offline can not to provide service, and client traffic is caused to break.
Therefore, how to shorten server system downtime, promote client traffic and execute stability, be those skilled in the art Member's technical issues that need to address.
Summary of the invention
The purpose of the application is to provide a kind of PCIE device failure system delay machine processing method, and this method greatly increases service Device system provides the time of service, reduces the time of client traffic broken string, improves client traffic and executes stability;The application's Another object is to provide a kind of PCIE device failure system delay machine processing unit, equipment and a kind of computer readable storage medium, With above-mentioned beneficial effect.
In order to solve the above technical problems, the application provides a kind of PCIE device failure system delay machine processing method, comprising:
When detecting server system delay machine, the internal register data for carrying fault log information are written reserved Fault log memory space;
When the register data, which is written, to be completed, system reboot is triggered;
When system reboot triggering, failure PCIE device is determined;Wherein, the failure PCIE device is posted according to the inside Latch data carries out data and parses to obtain;
Down state is set by the failure PCIE device;
Customer service is executed according to available PCI E equipment after the completion of system reboot.
Optionally, the reserved storage space is written into processor internal data, comprising:
Position the error register inside CPU;
The reserved storage space is written into the data stored in the error register.
Optionally, the PCIE device failure system delay machine processing method further include:
After the completion of system reboot, the prompt information of failure PCIE device exception is exported.
Optionally, the PCIE device failure system delay machine processing method further include:
When detecting the failure PCIE device is available mode, the data in the reserved storage space are deleted.
Optionally, when system reboot triggers, failure PCIE device is determined, comprising:
When system reboot triggering, the positioning of failure PCIE device is carried out according to the internal register data, obtains failure PCIE device.
The application discloses a kind of PCIE device failure system delay machine processing unit, comprising:
Data write. module, for the inside of fault log information will to be carried when detecting server system delay machine The reserved fault log memory space of register data write-in;
Trigger module is restarted, for triggering system reboot when completion is written in the register data;
Fault determination module, for determining failure PCIE device when system reboot triggering;Wherein, the failure PCIE Equipment carries out data according to the internal register data and parses to obtain;
Fault flag module, for setting down state for the failure PCIE device;
Business execution module, for executing customer service according to available PCI E equipment after the completion of system reboot.
Optionally, the Data write. module includes:
Positioning submodule, for positioning the error register inside CPU;
Submodule is written, the reserved storage space is written in the data for will store in the error register.
Optionally, the fault determination module is specially fault location module, and the fault location module is used for: working as system When restarting triggering, the positioning of failure PCIE device is carried out according to the internal register data, obtains failure PCIE device.
The application discloses a kind of PCIE device failure system delay machine processing equipment, comprising:
Memory, for storing program;
Processor, the step of PCIE device failure system delay machine processing method is realized when for executing described program.
The application discloses a kind of readable storage medium storing program for executing, and program is stored on the readable storage medium storing program for executing, and described program is located The step of reason device realizes the PCIE device failure system delay machine processing method when executing.
PCIE device failure system delay machine processing method provided herein, when detecting server system delay machine, By the reserved fault log memory space of the internal register data write-in for carrying fault log information, fault log is carried The internal register data of information can be used for subsequent determining failure PCIE device;When register data, which is written, to be completed, triggering System reboot;Failure PCIE device is determined after system reboot triggering, and sets down state for failure PCIE device, from And automatism isolation failure PCIE device, its influence for server system is avoided, the available of server system is improved Property, in addition, update can be replaced in any convenient time by being set to the failure PCIE device of down state, avoid The broken string of business caused by needing artificial apparatus to repair when system failure considerably increases server system and provides the time of service, Customer service can be continued to execute according to available PCI E equipment after the completion of system reboot, improve client traffic and execute stabilization Property.
Disclosed herein as well is a kind of PCIE device failure system delay machine processing unit, equipment and a kind of computer-readable deposit Storage media has above-mentioned beneficial effect, and details are not described herein.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of PCIE device failure system delay machine processing method flow chart provided by the embodiments of the present application;
Fig. 2 is a kind of structural block diagram of PCIE device failure system delay machine processing unit provided by the embodiments of the present application;
Fig. 3 is a kind of structural schematic diagram of PCIE device failure system delay machine processing equipment provided by the embodiments of the present application.
Specific embodiment
The core of the application is to provide a kind of PCIE device failure system delay machine processing method, and this method greatly increases service Device system provides the time of service, reduces the time of client traffic broken string, improves client traffic and executes stability;The application's Another core is to provide a kind of PCIE device failure system delay machine processing unit, equipment and a kind of computer readable storage medium, With above-mentioned beneficial effect.
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
The service quality of server system is always the attribute that user is concerned about the most.Measure the quality index of server system Generally comprise following three aspects: reliability, availability and maintainability.Reliability generally refers to server system The degree of any problem;Availability refers to the ability that still can be used after mistake occurs in server system;Maintainability is Refer to after there is hardware error in server, can quick positioning question, problem-solving ability.
This application provides a kind of PCIE device failure system delay machine processing method, this method can be in server delay machine Failure PCIE device is subjected to automatism isolation, promotes the availability of server.
Fig. 1 is please referred to, Fig. 1 is a kind of process of PCIE device failure system delay machine processing method provided in this embodiment Figure.This method specifically includes that
Step s110: when detecting server system delay machine, the internal register number of fault log information will be carried According to the reserved fault log memory space of write-in.
Certain memory space is reserved for saving the log of delay machine scene, reserved storage space volume is big as far as possible, to protect Card data to be stored can all be written.
In addition, the register data of write-in reserved space can be all register datas at delay machine scene, can also incite somebody to action The content of registers for occurring wrong (error) preserves, and to reduce data to be written, shortens server system business Downtime, while reducing the occupancy for daily record data for space to the greatest extent, it is preferable that can only institute inside storage processor There is the content of error register.
For convenience of daily record data reading analyze, unified data format can be set, data write-in while according to Reserved space is written in preset data format, it is of course also possible to the setting without data format, it is not limited here.
It is being detected it should be noted that can be the condition of register data write-in reserved space in triggering the present embodiment It is to server system delay machine, i.e., uncertain when being the server delay machine as caused by which kind of reason;Service can also be detected working as Delay machine caused by PCIE device failure occurs for device system, that is, when determining that server delay machine is caused by PCIE device failure, It is not limited here.
Step s120: when register data, which is written, to be completed, system reboot is triggered.
After the completion of fault message write-in, artificial replacement failure PCIE device this operation is no longer carried out, certain is avoided PCIE device needs manually to participate in after breaking down that server system service could be restored, and greatly increases server system and provides clothes The time of business reduces the time of client traffic broken string.
Step s130: when system reboot triggering, failure PCIE device is determined.
Determine that failure PCIE device, failure PCIE device are counted according to internal register data while system reboot It is obtained according to parsing, it should be noted that can be in step according to the process that processor internal data carries out the positioning of failure PCIE device It is completed in rapid s120, i.e., determines failure PCIE device according to the data of write-in immediately after writing data into reserved space, later In triggering system reboot, predetermined failure PCIE device can be directly acquired in step s130;It can also be in step It is completed in s130, i.e., after system trigger is restarted, reads data in reserved space and carry out the determination of failure PCIE device, this reality Example is applied not limit this.When for the latter, i.e., data analysis acquisition failure PCIE device is carried out after system trigger is restarted can To shorten the server system out-of-service time, the availability of server system is promoted.It is therefore preferred that step s130 is specific It can be with are as follows: when system reboot triggering, carry out the positioning of failure PCIE device according to internal register data, obtain failure PCIE and set It is standby.
In addition, determine that the specific steps of failure PCIE device are referred to the relevant technologies according to the data of reserved space, this It is repeated no more in embodiment.
Step s140: down state is set by failure PCIE device.
After setting down state for failure PCIE device, failure PCIE device no longer provides user service, automatically every The failure PCIE device of influence from to(for) server system improves OpenPOWER server availability.
Step s150: customer service is executed according to available PCI E equipment after the completion of system reboot.
Although doing so can lose certain functions (problem PCIE device provide function), entire server system can be with It works on, for server system, generally can all configure different types of a plurality of PCIE devices, such as SAS card, net Card, GPU card etc..And different PCIE devices carries different functions, for example the major function of SAS card is to support the number of user Network service function is provided according to store function, network card equipment.After carrying out unavailable setting to malfunctioning module, other PCIE devices It can continue to work.It avoids and needs manually to participate in that server system service could be restored after certain PCIE device breaks down, It greatly increases server system and the time of service is provided, reduce the time of client traffic broken string.
It is handled it should be noted that delay machine system processing method provided by the present application is suitable for existing using Intel, AMD The x86 server system of device and OpenPOWER server system using IBM POWER processor, be applied equally to it is upper State the server system of the identical PCIE specification of server system.
Based on the above-mentioned technical proposal, PCIE device failure system delay machine processing method provided by the embodiment of the present application, when When detecting server system delay machine, by the reserved fault log of the internal register data write-in for carrying fault log information Memory space, the internal register data for carrying fault log information can be used for subsequent determining failure PCIE device;When posting When latch data write-in is completed, system reboot is triggered;Failure PCIE device is determined after system reboot triggering, and by failure PCIE Equipment is set as down state, so that automatism isolation failure PCIE device, avoids its influence for server system, The availability of server system is improved, in addition, the failure PCIE device for being set to down state can be when any convenient Between be replaced update, avoid needed in system failure artificial apparatus repair caused by business broken string, considerably increase clothes Device system of being engaged in provides the time of service, can continue to execute customer service according to available PCI E equipment after the completion of system reboot, It improves client traffic and executes stability.
Substantially restore normal customer service after the completion of system reboot, but before the processing of progress failure PCIE device, therefore Barrier PCIE device is still within down state, and server system has lost the function of failure PCIE device offer, for guarantee compared with Restore the service for restoring failure PCIE device while client's entirety business as early as possible in short time, it is preferable that when system reboot is complete Cheng Hou can export the prompt information of failure PCIE device exception, in order to related technical personnel the suitable time as early as possible into The maintenance of row failure PCIE device is handled.
Pre-set down state is adjusted to available mode after the maintenance of failure PCIE device, indicates failure The formal business recovery of PCIE device can work as detection at this time to reduce temporary hash to the greatest extent to the occupancy of Installed System Memory To failure PCIE device be available mode when, delete reserved storage space in data.It is of course also possible to which data are moved to other Free space, it is not limited here.
To deepen the understanding to delay machine processing method provided by the present application, the present embodiment (is referred to and is adopted with OpenPOWER server With the server system of IBM POWER architecture processor) for be introduced, other server systems can refer to this implementation The introduction of example, details are not described herein.Calling in OpenPOWER server and executing delay machine treatment process mainly includes following functions mould Block: OCC module (functional module built in On-Chip Controller OpenPOWER processor), BMC, BIOS parsing Module (Basic Input Output System basic input output system).
Whole delay machine treatment process is carried out by above-mentioned module to mainly comprise the steps that
Fixing address and capacity foot are reserved when designing BIOS Flash memory space layout to save the log of delay machine scene Enough memory spaces.
When server system delay machine, the OCC module built in POWER processor detects processor and unrepairable occurs The problem of after, the DUMP code of OCC module preserves the content of all Error registers inside POWER processor, and Data are written in the memory space reserved in BIOS Flash chip according to designed format.
OCC notice BMC server system delay machine simultaneously completes log write-in, and BMC receives the letter that OCC has completed DUMP work After breath, activation system is restarted.
When BIOS parsing module detects system starting, first checks in " reserved space " and posted with the presence or absence of effective Error Latch data orients accurate trouble unit information, for example be PCIE if there is just parsing to data therein There is unrepairable mistake in PCIE device on Slot2.Then the failure PCIE device is set as Disable state.It After allow system reboot.Since failure PCIE device is by Disable, the PCIE device would not be made again after system reboot With then server system can normally lead into OS, execute the business of user.
Without effective Error register data in " if reserved space ", it is possible to which being not due to PCIE failure causes System failure, system reboot can be continued.
OpenPOWER server delay machine processing method provided in this embodiment is collected and is parsed delay machine information and automatically event Barrier PCIE device isolates outside system, improves OpenPOWER server availability, avoiding after certain PCIE device goes wrong needs It manually to participate in that server system service could be restored, greatly increase server system and the time of service is provided, reduce client The time of business broken string.
Referring to FIG. 2, Fig. 2 is a kind of knot of PCIE device failure system delay machine processing unit provided by the embodiments of the present application Structure block diagram;The device mainly includes: Data write. module 210 restarts trigger module 220, fault determination module 230, failure mark Remember module 240 and business execution module 250.
Wherein, Data write. module 210 is mainly used for that fault log will be carried when detecting server system delay machine The reserved fault log memory space of the internal register data write-in of information;
Restart trigger module 220 to be mainly used for triggering system reboot when completion is written in register data;
Fault determination module 230 is mainly used for determining failure PCIE device when system reboot triggers;Wherein, failure PCIE device carries out data according to internal register data and parses to obtain;
Fault flag module 240 is mainly used for setting down state for failure PCIE device;
Business execution module 250 is mainly used for executing customer service according to available PCI E equipment after the completion of system reboot.
Wherein, Data write. module may further include:
Positioning submodule, for positioning the error register inside CPU;
Submodule is written, for reserved storage space to be written in the data stored in error register.
Wherein, fault determination module is specifically as follows fault location module, and fault location module is used for: when system reboot touches When hair, the positioning of failure PCIE device is carried out according to internal register data, obtains failure PCIE device.
In addition, PCIE device failure system delay machine processing unit provided in this embodiment may further include: abnormal to mention Show module, for exporting the prompt information of failure PCIE device exception after the completion of system reboot.
In addition, PCIE device failure system delay machine processing unit provided in this embodiment may further include: data are deleted Except module, for deleting the data in reserved storage space when detecting failure PCIE device is available mode.
PCIE device failure system delay machine processing unit provided in this embodiment can increase server system and provide service Time, reduce client traffic broken string time, promoted client traffic execute stability.
The present embodiment provides a kind of PCIE device failure system delay machine processing equipments;The equipment specifically include that memory with And processor.PCIE device failure system delay machine processing equipment can refer to above-mentioned PCIE device failure system delay machine processing method It introduces.
Wherein, memory is mainly used for storing program;
Processor is mainly used for the step of realizing above-mentioned PCIE device failure system delay machine processing method when executing program.
Referring to FIG. 3, being a kind of structural representation of PCIE device failure system delay machine processing equipment provided in this embodiment Figure, the PCIE device failure system delay machine processing equipment can generate bigger difference because configuration or performance are different, can wrap One or more processors (central processing units, CPU) 322 is included (for example, at one or more Manage device) and memory 332, one or more store storage medium 330 (such as one of application programs 342 or data 344 Or more than one mass memory unit).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage. The program for being stored in storage medium 330 may include one or more modules (diagram does not mark), and each module can wrap It includes to the series of instructions operation in data processing equipment.Further, central processing unit 322 can be set to be situated between with storage Matter 330 communicates, and the series of instructions behaviour in storage medium 330 is executed in PCIE device failure system delay machine processing equipment 301 Make.
PCIE device failure system delay machine processing equipment 301 can also include one or more power supplys 326, one or More than one wired or wireless network interface 350, one or more input/output interfaces 358, and/or, one or one The above operating system 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in PCIE device failure system delay machine processing method described in above figure 1 can be by PCIE device failure The structure of system failure processing equipment is realized.
Present embodiment discloses a kind of readable storage medium storing program for executing, program is stored on readable storage medium storing program for executing, program is by processor The step of PCIE device failure system delay machine processing method is realized when execution, wherein PCIE device failure system delay machine processing side Method can refer to above-described embodiment, and details are not described herein.
The readable storage medium storing program for executing be specifically as follows USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), the various program storage generations such as random access memory (Random Access Memory, RAM), magnetic or disk The readable storage medium storing program for executing of code.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Above to PCIE device failure system delay machine processing method, device, equipment and readable storage provided herein Medium is described in detail.Specific examples are used herein to illustrate the principle and implementation manner of the present application, with The explanation of upper embodiment is merely used to help understand the present processes and its core concept.It should be pointed out that being led for this technology For the those of ordinary skill in domain, under the premise of not departing from the application principle, can also to the application carry out it is several improvement and Modification, these improvement and modification are also fallen into the protection scope of the claim of this application.

Claims (10)

1. a kind of PCIE device failure system delay machine processing method characterized by comprising
When detecting server system delay machine, by the reserved event of the internal register data write-in for carrying fault log information Hinder log memory space;
When the register data, which is written, to be completed, system reboot is triggered;
When system reboot triggering, failure PCIE device is determined;Wherein, the failure PCIE device is according to the internal register Data carry out data and parse to obtain;
Down state is set by the failure PCIE device;
Customer service is executed according to available PCI E equipment after the completion of system reboot.
2. PCIE device failure system delay machine processing method as described in claim 1, which is characterized in that by number inside processor According to the write-in reserved storage space, comprising:
Position the error register inside CPU;
The reserved storage space is written into the data stored in the error register.
3. PCIE device failure system delay machine processing method as described in claim 1, which is characterized in that further include:
After the completion of system reboot, the prompt information of failure PCIE device exception is exported.
4. PCIE device failure system delay machine processing method as described in claim 1, which is characterized in that further include:
When detecting the failure PCIE device is available mode, the data in the reserved storage space are deleted.
5. such as the described in any item PCIE device failure system delay machine processing methods of Claims 1-4, which is characterized in that when being When system restarts triggering, failure PCIE device is determined, comprising:
When system reboot triggering, the positioning of failure PCIE device is carried out according to the internal register data, obtains failure PCIE Equipment.
6. a kind of PCIE device failure system delay machine processing unit characterized by comprising
Data write. module, for when detecting server system delay machine, the inside for carrying fault log information to be deposited The reserved fault log memory space of device data write-in;
Trigger module is restarted, for triggering system reboot when completion is written in the register data;
Fault determination module, for determining failure PCIE device when system reboot triggering;Wherein, the failure PCIE device Data are carried out according to the internal register data to parse to obtain;
Fault flag module, for setting down state for the failure PCIE device;
Business execution module, for executing customer service according to available PCI E equipment after the completion of system reboot.
7. PCIE device failure system delay machine processing unit as claimed in claim 6, which is characterized in that mould is written in the data Block includes:
Positioning submodule, for positioning the error register inside CPU;
Submodule is written, the reserved storage space is written in the data for will store in the error register.
8. PCIE device failure system delay machine processing unit as claimed in claims 6 or 7, which is characterized in that the failure is true Cover half block is specially fault location module, and the fault location module is used for: when system reboot triggering, being posted according to the inside Latch data carries out the positioning of failure PCIE device, obtains failure PCIE device.
9. a kind of PCIE device failure system delay machine processing equipment characterized by comprising
Memory, for storing program;
Processor realizes the PCIE device failure system delay machine as described in any one of claim 1 to 5 when for executing described program The step of processing method.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with program on the readable storage medium storing program for executing, described program is located It manages and is realized when device executes as described in any one of claim 1 to 5 the step of PCIE device failure system delay machine processing method.
CN201910209284.8A 2019-03-19 2019-03-19 PCIE device failure system delay machine processing method, device and associated component Pending CN109947596A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910209284.8A CN109947596A (en) 2019-03-19 2019-03-19 PCIE device failure system delay machine processing method, device and associated component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910209284.8A CN109947596A (en) 2019-03-19 2019-03-19 PCIE device failure system delay machine processing method, device and associated component

Publications (1)

Publication Number Publication Date
CN109947596A true CN109947596A (en) 2019-06-28

Family

ID=67010253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910209284.8A Pending CN109947596A (en) 2019-03-19 2019-03-19 PCIE device failure system delay machine processing method, device and associated component

Country Status (1)

Country Link
CN (1) CN109947596A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN111404725A (en) * 2020-02-27 2020-07-10 苏州浪潮智能科技有限公司 Method and system for isolating failure PCIE (peripheral component interface express) equipment
CN111400076A (en) * 2020-02-28 2020-07-10 苏州浪潮智能科技有限公司 Downtime restoration method, device, equipment and storage medium
CN111414268A (en) * 2020-02-26 2020-07-14 华为技术有限公司 Fault processing method and device and server
CN112699073A (en) * 2021-01-06 2021-04-23 同方计算机有限公司 PCIE card on-line replacement method and system with controllable BMC system
CN113127243A (en) * 2019-12-30 2021-07-16 美光科技公司 Real-time triggering of transcryption error logs
CN113722156A (en) * 2021-11-02 2021-11-30 四川华鲲振宇智能科技有限责任公司 N +1 redundancy backup method and system for PCIe equipment
CN114356644A (en) * 2022-03-18 2022-04-15 阿里巴巴(中国)有限公司 PCIE equipment fault processing method and device
CN115426244A (en) * 2022-08-09 2022-12-02 武汉虹信技术服务有限责任公司 Network equipment fault detection method based on big data
WO2022267349A1 (en) * 2021-06-22 2022-12-29 苏州浪潮智能科技有限公司 Register reading method and apparatus, device, and medium
CN116382968A (en) * 2023-06-05 2023-07-04 苏州浪潮智能科技有限公司 Fault detection method and device for external equipment
CN116737396A (en) * 2023-08-14 2023-09-12 苏州浪潮智能科技有限公司 Method, device, electronic equipment and storage medium for configuring maintainability of server
US11971776B2 (en) 2019-12-30 2024-04-30 Micron Technology, Inc. Real-time trigger to dump an error log

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664702A (en) * 2012-04-05 2012-09-12 烽火通信科技股份有限公司 Protection mode of cross disc of M to N
US20160004608A1 (en) * 2014-07-01 2016-01-07 Bull Sas Method and device for synchronously running an application in a high availability environment
CN105893171A (en) * 2015-01-04 2016-08-24 伊姆西公司 Method and device for fault recovery in storage equipment
US20180039548A1 (en) * 2016-08-08 2018-02-08 International Business Machines Corporation Smart virtual machine snapshotting
CN108287775A (en) * 2018-03-01 2018-07-17 郑州云海信息技术有限公司 A kind of method, apparatus, equipment and the storage medium of server failure detection
CN108984332A (en) * 2018-06-22 2018-12-11 郑州云海信息技术有限公司 A kind of device and method of location-server delay machine failure

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664702A (en) * 2012-04-05 2012-09-12 烽火通信科技股份有限公司 Protection mode of cross disc of M to N
US20160004608A1 (en) * 2014-07-01 2016-01-07 Bull Sas Method and device for synchronously running an application in a high availability environment
CN105893171A (en) * 2015-01-04 2016-08-24 伊姆西公司 Method and device for fault recovery in storage equipment
US20180039548A1 (en) * 2016-08-08 2018-02-08 International Business Machines Corporation Smart virtual machine snapshotting
CN108287775A (en) * 2018-03-01 2018-07-17 郑州云海信息技术有限公司 A kind of method, apparatus, equipment and the storage medium of server failure detection
CN108984332A (en) * 2018-06-22 2018-12-11 郑州云海信息技术有限公司 A kind of device and method of location-server delay machine failure

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609778A (en) * 2019-08-16 2019-12-24 苏州浪潮智能科技有限公司 Method and system for storing server downtime log
CN113127243A (en) * 2019-12-30 2021-07-16 美光科技公司 Real-time triggering of transcryption error logs
US11829232B2 (en) 2019-12-30 2023-11-28 Micron Technology, Inc. Real-time trigger to dump an error log
US11971776B2 (en) 2019-12-30 2024-04-30 Micron Technology, Inc. Real-time trigger to dump an error log
CN111414268A (en) * 2020-02-26 2020-07-14 华为技术有限公司 Fault processing method and device and server
CN111404725B (en) * 2020-02-27 2022-06-07 苏州浪潮智能科技有限公司 Method and system for isolating failure PCIE (peripheral component interface express) equipment
CN111404725A (en) * 2020-02-27 2020-07-10 苏州浪潮智能科技有限公司 Method and system for isolating failure PCIE (peripheral component interface express) equipment
CN111400076A (en) * 2020-02-28 2020-07-10 苏州浪潮智能科技有限公司 Downtime restoration method, device, equipment and storage medium
CN112699073A (en) * 2021-01-06 2021-04-23 同方计算机有限公司 PCIE card on-line replacement method and system with controllable BMC system
US11860718B2 (en) 2021-06-22 2024-01-02 Inspur Suzhou Intelligent Technology Co., Ltd. Register reading method and apparatus, device, and medium
WO2022267349A1 (en) * 2021-06-22 2022-12-29 苏州浪潮智能科技有限公司 Register reading method and apparatus, device, and medium
CN113722156A (en) * 2021-11-02 2021-11-30 四川华鲲振宇智能科技有限责任公司 N +1 redundancy backup method and system for PCIe equipment
CN113722156B (en) * 2021-11-02 2022-02-18 四川华鲲振宇智能科技有限责任公司 N +1 redundancy backup method and system for PCIe equipment
CN114356644B (en) * 2022-03-18 2022-06-14 阿里巴巴(中国)有限公司 PCIE equipment fault processing method and device
CN114356644A (en) * 2022-03-18 2022-04-15 阿里巴巴(中国)有限公司 PCIE equipment fault processing method and device
CN115426244A (en) * 2022-08-09 2022-12-02 武汉虹信技术服务有限责任公司 Network equipment fault detection method based on big data
CN115426244B (en) * 2022-08-09 2024-03-15 武汉虹信技术服务有限责任公司 Network equipment fault detection method based on big data
CN116382968B (en) * 2023-06-05 2023-08-18 苏州浪潮智能科技有限公司 Fault detection method and device for external equipment
CN116382968A (en) * 2023-06-05 2023-07-04 苏州浪潮智能科技有限公司 Fault detection method and device for external equipment
CN116737396A (en) * 2023-08-14 2023-09-12 苏州浪潮智能科技有限公司 Method, device, electronic equipment and storage medium for configuring maintainability of server
CN116737396B (en) * 2023-08-14 2023-11-03 苏州浪潮智能科技有限公司 Method, device, electronic equipment and storage medium for configuring maintainability of server

Similar Documents

Publication Publication Date Title
CN109947596A (en) PCIE device failure system delay machine processing method, device and associated component
KR101574451B1 (en) Imparting durability to a transactional memory system
CN104685474B (en) For the method for handling not repairable EMS memory error and non-transient processor readable medium
WO2019196199A1 (en) Method and device for processing bad tracks of disk and computer storage medium
CN110008129B (en) Reliability test method, device and equipment for storage timing snapshot
CN109582502A (en) Storage system fault handling method, device, equipment and readable storage medium storing program for executing
CN106682162A (en) Log management method and device
CN106603279A (en) Disaster tolerance method and disaster tolerance system
CN104216771B (en) The method for restarting and device of software program
JP2007133544A (en) Failure information analysis method and its implementation device
CN108776579A (en) A kind of distributed storage cluster expansion method, device, equipment and storage medium
CN108958965A (en) A kind of BMC monitoring can restore the method, device and equipment of ECC error
CN110543398A (en) method and system for recording fault information
CN103678608A (en) Log management method and device
CN108958749A (en) Realize the method, device and equipment that the BIOS data of packing forms are write with a brush dipped in Chinese ink
WO2024077863A1 (en) Recovery method for all-flash storage system, and related apparatus
CN104407806A (en) Method and device for revising hard disk information of redundant array group of independent disk (RAID)
CN111475335A (en) Method, system, terminal and storage medium for fast recovery of database
CN106407385A (en) Data management method and system, and equipment
CN115391106A (en) Method, system and device for pooling backup resources
CN109189615A (en) A kind of delay machine treating method and apparatus
CN114816806A (en) Container availability verification method and device, computer equipment and storage medium
CN114461341A (en) Method, device and medium for preventing brain crack of cloud platform virtual machine
CN114490802A (en) Time sequence data management method, device and equipment and readable storage medium
CN110703988B (en) Storage pool creating method, system, terminal and storage medium for distributed storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628