CN106598790A - Server hardware failure detection method, apparatus of server, and server - Google Patents

Server hardware failure detection method, apparatus of server, and server Download PDF

Info

Publication number
CN106598790A
CN106598790A CN201510673005.5A CN201510673005A CN106598790A CN 106598790 A CN106598790 A CN 106598790A CN 201510673005 A CN201510673005 A CN 201510673005A CN 106598790 A CN106598790 A CN 106598790A
Authority
CN
China
Prior art keywords
server
hardware
output system
basic input
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510673005.5A
Other languages
Chinese (zh)
Inventor
李存龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201510673005.5A priority Critical patent/CN106598790A/en
Priority to PCT/CN2016/100618 priority patent/WO2017063505A1/en
Publication of CN106598790A publication Critical patent/CN106598790A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention provides a server hardware failure detection method, an apparatus of a server, and the server. The method comprises the steps that a basic input output system apparatus of the server detects that the server enters a startup phase; the basic input output system apparatus begins to perform failure detection and analysis on hardware of the server in each working phase, wherein the working phase comprises the startup phase, information of hardware failures detected by the BIOS apparatus covers information of pre-detected failures of the hardware of the server in the whole running period of the server, so that the failures of the server in running are processed timely, so as to improve running stability and reliability of the server; and further, the basic input output system apparatus stores the hardware failure information obtained through detection and analysis. Operators can process the failures conveniently, and the hardware failure information is stored and managed in a unified manner.

Description

A kind of server hardware fault detection method and its device and server
Technical field
The present invention relates to computer and the communications field, more particularly to a kind of server hardware fault detection method and Its device and server.
Background technology
On current middle-and high-end server, server typically all has part black box function, is used for The fault message record that operating system collapse is, can be by OS's (operating system, Operating System) Various kernel exceptions such as kernel fault, restart reset, abnormal type information etc. and record, it is also possible to pass through The simple hardware error in SEL (System Event Log, System Event Log) records part, then it is or logical The mode (such as joint test link) crossed outside band gathers at the scene mistake after failure generation, or logical The passive monitoring device exception of exception-triggered mechanism crossed in band, and the exception-triggered mechanism in band needs exception Condition go trigger its exception record module just recorded.These methods can help to a certain extent safeguard people Member determines failure Producing reason, but these methods still suffer from following defect:
1st, above-mentioned method is recorded by passive detection trigger, lacks the active detecting to server, especially It is that monitoring is screened in the active to server hardware failure.For normally starting and run in system, and business The situation that quality significantly declines, system can't trigger fault message record, and this is to will result in failure letter Breath is missed so that attendant traces difficulty when safeguarding to fault message.
2nd, in system crash or generation exception-triggered, can just fault message be recorded to detection due to only, Therefore, cause serious to the acquisition capacity and analysis ability of hardware fault in system (business) running Deficiency, so as to cause the pre-alerting ability of system not enough, reduces the stability and reliability of system.
3rd, it is excessively simple, scattered for the fault message of record, without accurately unified record management, it is impossible to Accomplish to settle fault information analysis at one go, the later stage needs substantial amounts of analysis and examination, cross validation just can look for To major failure source.
4th, to fault information acquisition by way of outside band, professional, office point environment, information can be limited to Safety etc., environment deployment, personnel's coordination, environment recovery etc. are with high costs.
Therefore, in current server failure information record implementation, only under given conditions could The detection record of fault message is realized, and the fault message of its record is simple, scattered, needs the big of later stage Amount analysis.
The content of the invention
The main technical problem to be solved in the present invention be to provide a kind of server hardware fault detection method and its Device and server, solve to realize carrying out reality to the hardware of server each working stage in prior art When fault message detection and the technical problem of record storage.
To solve above-mentioned technical problem, the present invention provides a kind of server hardware fault detection method, including:
The basic input output system device of server detects the server and enters startup stage;
The basic input output system device starts to carry out hardware fault in each working stage to the server Detection, the working stage includes the startup stage;
The basic input output system device will detect that the hardware fault information for obtaining is stored.
In an embodiment of the present invention, the startup stage include initial phase, the basic input and output System and device carries out hardware failure detection in the initial phase to the server to be included:
The hardware detection mechanism that the basic input output system device is provided according to the server is to the clothes At least one of the business CPU of device, internal memory, chipset and power supply carry out the pre-detection of hardware and obtain current Hardware information, faulty hardware information is filtered out from the hardware information is analyzed process and obtain accordingly Hardware fault information.
In an alternative embodiment of the invention, the startup stage is described substantially defeated also including the device enumeration stage Entering output system device and carrying out hardware failure detection to the server in the device enumeration stage includes:
The basic input output system device obtains the status information and resource letter of each hardware on the server Breath, and therefrom recognize the fault message of the hardware for breaking down.
In an alternative embodiment of the invention, the startup stage is cold-start phase or thermal starting stage.
In an alternative embodiment of the invention, the working stage also includes operating system pre-boot phase and operation At least one of system business operation phase.
In an alternative embodiment of the invention, it is described when the working stage includes operating system pre-boot phase Basic input output system device carries out hardware fault in the operating system pre-boot phase to the server Detection includes:
The basic input output system device is to the hardware device outside the server band that will be booted up Carry out pre-detection;
Obtain the Current hardware information of the hardware device;
The fault message of the hardware device for breaking down is filtered out from the Current hardware information;
When the working stage includes the operating system service operation stage, the basic input output system device Carrying out hardware failure detection to the server in the operating system service operation stage includes:It is described basic Input-output system device judges whether the hardware interrupt of the server arrives, if so, the then base This input-output system device is detected to the related hardware of the operating system;Obtain the event of the hardware Barrier information.
In an alternative embodiment of the invention, the failure for obtaining will be detected in the basic input output system device Before information is stored, it is additionally included on the server serial flash memorizer and distributes one for storing The failed storage area of the hardware fault information.
To solve above-mentioned technical problem, the present invention also provides a kind of basic input output system device, including:
Fault message detects trigger module, and whether startup stage is entered for detection service device;
Fault message detection module, enters for detecting the server in the fault detect trigger module During startup stage, start to carry out hardware failure detection, the work rank in each working stage to the server Section includes the startup stage;
Fault message memory module, for the hardware fault information that fault message detection module detection is obtained Stored.
In an alternative embodiment of the invention, also including storage setup module, in fault message storage Before module is stored the hardware fault information, distribute on the server serial flash memorizer One failed storage area for being used to store the hardware fault information.
To solve above-mentioned technical problem, the present invention also provides a kind of server includes basic input as above Output system device.
The invention has the beneficial effects as follows:
A kind of server hardware fault detection method and its device and server that the present invention is provided, by server Basic input output system (Basic Input Output System, BIOS) device detect server and enter When entering startup stage, starting the hardware of each working stage to the server carries out fault detection analysis and then obtains To corresponding hardware fault information.It is the BIOS devices of server itself due to what is utilized, therefore can detects The all hardware failure being likely to occur in the whole cycle of server operation, can be lifted and be improve to hardware fault The comprehensive and accuracy of infomation detection, and more conducively realize the unified storage to server hardware fault message Management, it is ensured that attendant can be accurately obtained hardware fault letter when safeguarding to the server Breath learns position and the failure cause of the hardware for needing troubleshooting, further increasing stablizing for server Property and reliability.
Description of the drawings
Fig. 1 is the flow chart of server hardware fault detection method provided by the present invention;
Fig. 2 is the flow chart that the server initiation stage provided by the present invention carries out hardware failure detection;
Fig. 3 is the flow chart that present device enumeration stage carries out hardware failure detection;
Fig. 4 is the flow chart that operating system pre-boot phase of the present invention carries out hardware failure detection;
Fig. 5 is the flow chart that the operating system service operation stage of the present invention carries out hardware failure detection;
The basic input output system apparatus structure block diagram that Fig. 6 is provided for the present invention.
Specific embodiment
Accompanying drawing is combined below by specific embodiment to be described in further detail the present invention.
Embodiment one:
Fig. 1 is refer to, Fig. 1 is the flow chart of server hardware fault detection method provided by the present invention, this The server hardware fault detection method that embodiment is provided should be understood that by the basic input and output system Bulk cargo is put actively carries out fault detect to the hardware of server, and active here refers to be preset according to server Testing mechanism, in startup of server, the BIOS devices are immediately performed the operation hardware to the server Carry out failure detection operations or the BIOS devices all carries out hardware to each working stage of the server Failure detection operations, specifically include following steps:
S101, the basic input output system device of server detects the server and enters startup stage;
In the present embodiment, the startup stage of server is cold-start phase or thermal starting stage;It is described basic Input-output system device detects the server and refers to into startup stage:It is when the startup stage During cold-start phase, the basic input output system device can in the following manner be detected whether to enter and opened The dynamic stage but it is not limited in the following manner:Clothes are pressed or detected to the power switch key on detection service device whether Whether the power supply circuits of business device are connected or by the state flag bit of inspection power supply with server power supply interface, If so, then server has been enter into startup stage, and server runs, execution step S102, otherwise, continues Detection;
When the startup stage being the thermal starting stage, the basic input output system device is by detecting institute State whether server has reset enabling signal to be input into, if so, then server proceeds by thermal starting operation, holds Row step S102, otherwise, continues to detect;Here reset enabling signal can be input into by hardware trigger, Such as:It is input into by way of reset key;Can also be input into by way of software is realized, such as: Realize periodically being input into server by code, instrument;Can also be user actively by order or behaviour Make the input of " restarting " button.
S102, the basic input output system device start to the server each working stage hardware Fault detect is carried out, the working stage includes the startup stage;
S103, the basic input output system device will detect that the hardware fault information for obtaining is stored.
In the present embodiment, before step S103, it is additionally included on the server serial flash memorizer Distribution one is used to store the failed storage area of the hardware fault information;Further, the basic input The fault message content of output system device record storage includes:Time, the event of generation, the order of severity, Particular location or fault details, it is proposed that processing mode.
In the present embodiment, performed after above-mentioned step detects hardware fault information and stored, when When attendant needs to safeguard the server, attendant can be by being connected with the memory block The hardware fault information with storing described in outer control platform or network user interface direct access, convenient dimension Shield personnel follow the trail of failure and track occur, and in-situ FTIR spectroelectrochemitry, replacing fault hardware are (such as:Certain CPU is directly changed, Which root memory bar is directly changed, fault bus interface card is directly replaced).In middle and high end server by heat Plugging technique (including but not limited to:CPU hot plugs, memory hot plug, EBI hot plug) completely Can ensure that system operation is uninterrupted, reach early discovery, early early warning, early prevention, the early purpose for processing.I.e. Server is hung during cold start-up or thermal starting extremely, server peripheral hardware cannot use (as network interface is obstructed, Screen is not bright, keyboard and mouse is not responding to), still can get effective fault message.
In the present embodiment, operation maintenance personnel gets hardware fault information by control platform, except locating in time Outside reason, can also be by the hardware fault storage dump to other band peripheral storage device.
In the present embodiment, the startup stage include initial phase, basic input output system dress Put the step of the initial phase carries out fault detect to the hardware of the server as shown in Fig. 2 its Specifically include:
S201, the basic input output system device initialization CPU, internal memory, chipset and power supply;
S202, the basic input output system device detection is obtained in CPU, internal memory, chipset and power supply At least one Current hardware information;
In the present embodiment, the basic input output system device is the hardware provided according to the server Testing mechanism carries out pre-detection at least one of the CPU of the server, internal memory, chipset and power supply Current hardware information is obtained, is filtered out from the hardware information from faulty hardware information is analyzed Reason obtains corresponding hardware fault information.
Specifically, in the present embodiment, when the basic input output system device detects the server When having been enter into initial phase, the BIOS devices can be increased using the BIOS devices itself or actively Pressure or the hardware detection mechanism provided using CPU and chipset or the integration tool in utilization band are (such as MEMTEST instrument, system event diary testing tool) etc. mode, actively initiate to CPU, internal memory, core The failure of the server hardware such as piece group and power supply and configuration detected, obtains corresponding hardware information, then Preanalysis judgement, pre- statistics, pre- examination, scanning, tolerance hardware are carried out to accessed hardware information, And collect test result, and it is (follow-up abnormal including triggering system to filter out effective fault message Information) carry out detailed record and stored;So that when server occurs in this stage system exception feelings During condition, it is ensured that the server obtained and recorded more detailed hardware failures before system exception generation Information.In this stage, the fault message of the record storage includes but is not limited to:CPU mistakes with alarm, CBO (buffer area, Caching Agent) mistakes with alarm, QPI (Quick Path Interconnect, QuickPathInterconnect) mistake and alarm, IIO (integrated input/output, Integrated I/O) port Mistake (integrates Memory control with alarm, HA (local agent, Home Agent) mistakes and alarm, IMC Device, Integrated Memory Controller) mistake with alarm, PCU (power control unit, Power Control Unit) mistake and alarm, power supply and voltage error and alarm, EMS memory error and alarm are (including memory bar itself Mistake and alarm, main memory access mistake and alarm, internal memory are inserted method mistake and alarm, memory voltage mistake and are accused The incompatible mistake of alert, internal memory and alarm, configuration error and alarm etc.).
In the present embodiment, the startup stage also include the device enumeration stage, the stage carries out hardware fault The flow chart of detection is as shown in figure 3, specifically include following steps:
S301, the basic input output system device starts device enumeration;
S302, the basic input output system device detects the current information of acquisition equipment;
Further, the basic input output system device obtains the state letter of each hardware on the server Breath and resource information, and therefrom recognize the fault message of the software and hardware for breaking down.In this stage, it is described Fault message includes but is not limited to:Equipment access errors (illegal including internal memory and I/O requirement), third party Firmware (OPTION ROM) be not carried out (including insufficient space, form not to), device damage it is disabled. Specifically, in the present embodiment, when the server is to EBI (Peripheral Component Interface Express, PCIE) peripheral hardware issues probe task, and during computational resource requirements, the BIOS devices are according to inspection Survey mechanism starts to recognize third party's firmware (OPTION ROM) identifier, the manufacturer's letter that industrial specification is formulated Breath, device class information and capacity, check hardware state configured information (such as linking status, bandwidth information) Deng, and identify that the fault message of faulty hardware is stored from above-mentioned information.
In the present embodiment, the working stage also includes operating system pre-boot phase and operating system business At least one of operation phase;Refer to Fig. 4,5, respectively operating system pre-boot phase, operation system The system service operation stage carries out the flow chart of hardware failure detection;
Such as Fig. 4, the operating system pre-boot phase carries out hardware failure detection analysis and comprises the following steps:
S401, the basic input output system device is to hard outside the server band that will be booted up Part equipment carries out pre-detection;
S402, obtains the Current hardware information of the hardware device;
S403, filters out the fault message of the hardware device for breaking down from the Current hardware information;
In the present embodiment, the hardware device outside the server band is included but is not limited to:Hard disk, server Network interface, equipment guiding attribute;The fault message includes but is not limited to:Can not starting device, hard disk (or U Disk) damage (destruction of subregion containing MBR), the failure of PXE netboots (containing port information, network ping not It is logical), ME (Management Engine) working condition exception.Preferably, it is described when in this stage When basic input output system device carries out fault detect to the fdisk, the basic input and output system Bulk cargo is put and actively initiate detection acquisition signal, obtains the MBR (MBR subregions) of hard disk (USB flash disk) Data, analysis boot flag, end mark and error message data field, according to the hardware that the server is provided Testing machine judges whether hard disk (USB flash disk) can guide, damage;By issuing self-inspection command determination server Communication link state, mode of operation between main frame;By DHCP (Dynamic Host Configuration Protocol, DHCP) communicate whether inspection network connects;Single board starting equipment is enumerated, is examined Look into the presence or absence of can starting device.
Such as Fig. 5, the operating system service operation stage carries out hardware failure detection analysis and comprises the following steps:
Whether S501, the hardware interrupt for judging the server arrives;
S502, if so, then the basic input output system device enters to the related hardware of the operating system Row detection;
S503, obtains the fault message of the hardware;
It is described substantially defeated when the hardware interrupt arrival is determined in above-mentioned fault detection analysis Entering the output system device pair hardware related to the service operation carries out fault detect, and hard to what is detected Part fault message is analyzed, classifies, counts, and then the fault message is stored.In the stage The fault message of detection is included but is not limited to:CPU mistakes and alarm, CBO mistakes and alarm, QPI mistakes With alarm, VT-D mistakes and alarm, IIO port errors and alarm, EMS memory error and alarm, PCIE mistakes With alarm, PCU mistakes and alarm, Ubox (Utility Box) mistakes and alarm.Preferably, in the rank The hardware failure detection process of section, the BIOS devices open MCA (Machine Check Architecture) Function and enhancement mode error logging AER (Advance Error Report) function, open each component correspondence Error detection block (Machine Check Error Bank) switch, mounting Fault Identification classification function and The fault processing Hook Function of each component.When MCE (Machine-Check Exception) occurs extremely When, hardware drags down error condition pin, generation system management interrupt (SMI).Now BIOS devices Control is obtained, recognizes that classification function reads the error condition deposit that CPU and bridge piece are carried by hardware fault Device, obtains error detection block (Machine Check Error Bank) specifying information, then according to chip handss Volume is parsed in detail, specific hardware error message is separated, is interpreted and.
Embodiment two:
Present embodiments provide a kind of basic input output system device, it should be appreciated that the BIOS devices can In to be arranged at any server, realize to server in the hardware failure detection of any working stage, please join As shown in Figure 6, basic input output system device 60 includes:
Fault message detects trigger module 61, and for detecting the server startup stage is entered;
Fault message detection module 62, for starting to carry out event in the hardware of each working stage to the server Barrier detection, the working stage includes the startup stage;
Fault message memory module 63, will test and analyze what is obtained for the basic input output system device Hardware fault information is stored.
In the present embodiment, in the startup stage of server, the fault message detection module 62 is according to described The hardware detection mechanism that server is provided in the CPU of the server, internal memory, chipset and power supply extremely Few one carries out pre-detection and obtains current hardware information, filters out from the hardware information faulty hard Part information is analyzed process and obtains corresponding hardware fault information.
In the device enumeration stage of server, the fault message detection module 62 obtains each on the server The status information and resource information of hardware, and therefrom recognize the fault message of the hardware for breaking down.
In the operating system pre-boot phase of server, the fault message detection module 62 pairs will be guided The hardware device outside the server band for starting carries out pre-detection;
Obtain the Current hardware information of the hardware device;
The fault message of the hardware device for breaking down is filtered out from the Current hardware information;
At the operating system service operation stage of server, 62 pairs of clothes of the fault message detection module Business device carries out hardware failure detection to be included:The basic input output system device judges the hard of the server Whether part interrupt signal arrives, if so, then the basic input output system device to the operating system Related hardware is detected;Obtain the fault message of the hardware.
In the present embodiment, also including storage setup module 64, for inciting somebody to action in the fault message memory module Before the fault message is stored, one is distributed on the server serial flash memorizer is used to deposit Store up the failed storage area of the hardware fault information.
In the present invention, a kind of server is additionally provided, the server includes basic input as above Output system device.
The technical scheme that the present invention is provided can be widely applied on the equipment such as computer, network communication equipment, lead to The hardware device crossed in the whole cycle that basic input output system device runs to the server carries out failure Detection, can prevent the server to break down in running, improve the steady of the server operation Qualitative and reliability.
Above content is to combine specific embodiment further description made for the present invention, it is impossible to recognized Being embodied as of the fixed present invention is confined to these explanations.For the ordinary skill of the technical field of the invention For personnel, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, Protection scope of the present invention should be all considered as belonging to.

Claims (10)

1. a kind of server hardware fault detection method, it is characterised in that include:
The basic input output system device of server detects the server and enters startup stage;
The basic input output system device starts to carry out hardware fault in each working stage to the server Detection, the working stage includes the startup stage;
The basic input output system device will detect that the hardware fault information for obtaining is stored.
2. server failure detection method as claimed in claim 1, it is characterised in that the startup Stage includes initial phase, and the basic input output system device is in the initial phase to the clothes Business device carries out hardware failure detection to be included:
The hardware detection mechanism that the basic input output system device is provided according to the server is to the clothes At least one of the business CPU of device, internal memory, chipset and power supply carry out the pre-detection of hardware and obtain current Hardware information, faulty hardware information is filtered out from the hardware information is analyzed process and obtain accordingly Hardware fault information.
3. server hardware fault detection method as claimed in claim 2, it is characterised in that described Startup stage also includes the device enumeration stage, and the basic input output system device is in the device enumeration rank Section carries out hardware failure detection to the server to be included:
The basic input output system device obtains the status information and resource letter of each hardware on the server Breath, and therefrom recognize the fault message of the hardware for breaking down.
4. the server hardware fault detection method as described in any one of claim 1-3, its feature exists In the startup stage is cold-start phase or thermal starting stage.
5. the server hardware fault detection method as described in any one of claim 1-3, its feature exists In, the working stage also include operating system pre-boot phase and in the operating system service operation stage extremely It is few one.
6. server hardware fault detection method as claimed in claim 5, it is characterised in that described When working stage includes operating system pre-boot phase, the basic input output system device is in the operation System pre-boot phase carries out hardware failure detection to the server to be included:
The basic input output system device is to the hardware device outside the server band that will be booted up Carry out pre-detection;
Obtain the Current hardware information of the hardware device;
The fault message of the hardware device for breaking down is filtered out from the Current hardware information;
When the working stage includes the operating system service operation stage, the basic input output system device Carrying out hardware failure detection to the server in the operating system service operation stage includes:It is described basic Input-output system device judges whether the hardware interrupt of the server arrives, if so, the then base This input-output system device is detected to the related hardware of the operating system;Obtain the event of the hardware Barrier information.
7. the server hardware fault detection method as described in any one of claim 1-3, its feature exists In before the basic input output system device will detect that the fault message that obtains is stored, also wrapping Include and distribute a failure for being used to store the hardware fault information on the server serial flash memorizer Memory block.
8. a kind of basic input output system device, it is characterised in that include:
Fault message detects trigger module, and whether startup stage is entered for detection service device;
Fault message detection module, enters for detecting the server in the fault detect trigger module During startup stage, start to carry out hardware failure detection, the work rank in each working stage to the server Section includes the startup stage;
Fault message memory module, for the hardware fault information that fault message detection module detection is obtained Stored.
9. basic input output system device as claimed in claim 8, it is characterised in that also include Storage setup module, for the hardware fault information to be carried out into storage in the fault message memory module Before, an event for being used to store the hardware fault information is distributed on the server serial flash memorizer Barrier memory block.
10. a kind of server, it is characterised in that including basic input as claimed in claim 8 or 9 Output system device.
CN201510673005.5A 2015-10-16 2015-10-16 Server hardware failure detection method, apparatus of server, and server Pending CN106598790A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510673005.5A CN106598790A (en) 2015-10-16 2015-10-16 Server hardware failure detection method, apparatus of server, and server
PCT/CN2016/100618 WO2017063505A1 (en) 2015-10-16 2016-09-28 Method for detecting hardware fault of server, apparatus thereof, and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510673005.5A CN106598790A (en) 2015-10-16 2015-10-16 Server hardware failure detection method, apparatus of server, and server

Publications (1)

Publication Number Publication Date
CN106598790A true CN106598790A (en) 2017-04-26

Family

ID=58517771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510673005.5A Pending CN106598790A (en) 2015-10-16 2015-10-16 Server hardware failure detection method, apparatus of server, and server

Country Status (2)

Country Link
CN (1) CN106598790A (en)
WO (1) WO2017063505A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291584A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of chassis failure detection method and system
CN109117299A (en) * 2017-06-23 2019-01-01 佛山市顺德区顺达电脑厂有限公司 The error detecting device and its debugging method of server
CN109426606A (en) * 2017-08-23 2019-03-05 东软集团股份有限公司 Kernel failure diagnosis information processing method, device, storage medium and electronic equipment
CN109697144A (en) * 2018-11-22 2019-04-30 合肥联宝信息技术有限公司 The hard disk detection method and electronic equipment of a kind of electronic equipment
CN109783283A (en) * 2018-12-11 2019-05-21 中国长城科技集团股份有限公司 A kind of processing method, device and the terminal device of hardware detection information
CN109918257A (en) * 2017-12-12 2019-06-21 杭州海康威视数字技术股份有限公司 A kind of hard disk abnormality eliminating method and device
CN111722954A (en) * 2020-06-30 2020-09-29 曙光信息产业(北京)有限公司 Server abnormity positioning method and device, storage medium and server
CN111767184A (en) * 2020-09-01 2020-10-13 苏州浪潮智能科技有限公司 Fault diagnosis method and device, electronic equipment and storage medium
CN112148576A (en) * 2020-09-28 2020-12-29 北京基调网络股份有限公司 Application performance monitoring method and system and storage medium
CN113064747A (en) * 2021-03-26 2021-07-02 山东英信计算机技术有限公司 Fault positioning method, system and device in server starting process
CN113190278A (en) * 2021-03-18 2021-07-30 山东英信计算机技术有限公司 Multi-scenario fault processing method, system and medium
CN115047322A (en) * 2022-08-17 2022-09-13 中诚华隆计算机技术有限公司 Method and system for identifying fault chip of intelligent medical equipment
WO2022262525A1 (en) * 2021-06-18 2022-12-22 华为技术有限公司 Fault handling method and apparatus, device, and system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110187994A (en) * 2019-05-28 2019-08-30 北京星网锐捷网络技术有限公司 A kind of failure separation method, equipment and fault isolation system
CN110737560B (en) * 2019-10-22 2023-10-20 北京百度网讯科技有限公司 Service state detection method and device, electronic equipment and medium
CN113220407B (en) * 2020-02-04 2023-09-26 北京京东振世信息技术有限公司 Fault exercise method and device
CN113590413B (en) * 2021-06-29 2024-05-10 浪潮商用机器有限公司 UNIX server, and UNIX server fault early warning method and device
CN114389971B (en) * 2022-03-23 2022-12-23 苏州浪潮智能科技有限公司 Intelligent monitoring fine adjustment method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1308843A2 (en) * 2001-11-02 2003-05-07 Siemens Aktiengesellschaft Method for displaying error messages on a microcomputer
CN102369513A (en) * 2011-08-31 2012-03-07 华为技术有限公司 Method for improving stability of computer system and computer system
CN103166773A (en) * 2011-12-09 2013-06-19 国家电网公司 Method and system for monitoring operation state of server
CN103713981A (en) * 2013-12-31 2014-04-09 国网山东省电力公司 Database server performance detection and early warning method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8892719B2 (en) * 2007-08-30 2014-11-18 Alpha Technical Corporation Method and apparatus for monitoring network servers
JP5678717B2 (en) * 2011-02-24 2015-03-04 富士通株式会社 Monitoring device, monitoring system, and monitoring method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1308843A2 (en) * 2001-11-02 2003-05-07 Siemens Aktiengesellschaft Method for displaying error messages on a microcomputer
CN102369513A (en) * 2011-08-31 2012-03-07 华为技术有限公司 Method for improving stability of computer system and computer system
CN103166773A (en) * 2011-12-09 2013-06-19 国家电网公司 Method and system for monitoring operation state of server
CN103713981A (en) * 2013-12-31 2014-04-09 国网山东省电力公司 Database server performance detection and early warning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王修智: "《电子信息技术》", 30 April 2007 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117299B (en) * 2017-06-23 2022-04-05 佛山市顺德区顺达电脑厂有限公司 Error detecting device and method for server
CN109117299A (en) * 2017-06-23 2019-01-01 佛山市顺德区顺达电脑厂有限公司 The error detecting device and its debugging method of server
CN107291584A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of chassis failure detection method and system
CN109426606A (en) * 2017-08-23 2019-03-05 东软集团股份有限公司 Kernel failure diagnosis information processing method, device, storage medium and electronic equipment
CN109918257A (en) * 2017-12-12 2019-06-21 杭州海康威视数字技术股份有限公司 A kind of hard disk abnormality eliminating method and device
CN109918257B (en) * 2017-12-12 2022-11-04 杭州海康威视数字技术股份有限公司 Hard disk exception handling method and device
CN109697144A (en) * 2018-11-22 2019-04-30 合肥联宝信息技术有限公司 The hard disk detection method and electronic equipment of a kind of electronic equipment
CN109783283A (en) * 2018-12-11 2019-05-21 中国长城科技集团股份有限公司 A kind of processing method, device and the terminal device of hardware detection information
CN111722954A (en) * 2020-06-30 2020-09-29 曙光信息产业(北京)有限公司 Server abnormity positioning method and device, storage medium and server
CN111767184A (en) * 2020-09-01 2020-10-13 苏州浪潮智能科技有限公司 Fault diagnosis method and device, electronic equipment and storage medium
CN112148576A (en) * 2020-09-28 2020-12-29 北京基调网络股份有限公司 Application performance monitoring method and system and storage medium
CN113190278A (en) * 2021-03-18 2021-07-30 山东英信计算机技术有限公司 Multi-scenario fault processing method, system and medium
WO2022198972A1 (en) * 2021-03-26 2022-09-29 山东英信计算机技术有限公司 Method, system and apparatus for fault positioning in starting process of server
CN113064747A (en) * 2021-03-26 2021-07-02 山东英信计算机技术有限公司 Fault positioning method, system and device in server starting process
WO2022262525A1 (en) * 2021-06-18 2022-12-22 华为技术有限公司 Fault handling method and apparatus, device, and system
CN115047322A (en) * 2022-08-17 2022-09-13 中诚华隆计算机技术有限公司 Method and system for identifying fault chip of intelligent medical equipment

Also Published As

Publication number Publication date
WO2017063505A1 (en) 2017-04-20

Similar Documents

Publication Publication Date Title
CN106598790A (en) Server hardware failure detection method, apparatus of server, and server
US11360842B2 (en) Fault processing method, related apparatus, and computer
US8843785B2 (en) Collecting debug data in a secure chip implementation
US20100262863A1 (en) Method and device for the administration of computers
CN104639380A (en) Server monitoring method
CN106776282A (en) The abnormality eliminating method and device of a kind of bios program
CN106789306A (en) Restoration methods and system are collected in communication equipment software fault detect
CN113806127B (en) Server log collection method, device and readable storage medium
CN104734904B (en) The automatic test approach and system of bypass equipment
CN116126772A (en) UART serial port management system and method applied to ARM server
CN103995759B (en) High-availability computer system failure handling method and device based on core internal-external synergy
CN106610878A (en) Fault debugging method for dual-controller system
CN115599617A (en) Bus detection method and device, server and electronic equipment
CN103605593B (en) The fault diagnosis of heterogeneous system, restoration methods and device
CN107179911A (en) A kind of method and apparatus for restarting management engine
CN113076210A (en) Server fault diagnosis result notification method, system, terminal and storage medium
CN113742113A (en) Embedded system health management method, equipment and storage medium
CN113867994B (en) Cabinet VPD information processing method and device, storage equipment and readable storage medium
CN105160259B (en) A kind of virtualization vulnerability mining system and method based on fuzz testing
CN109284218A (en) A kind of method and device thereof of detection service device operation troubles
JPH1188471A (en) Test method and test equipment
JP7367495B2 (en) Information processing equipment and communication cable log information collection method
KR102526368B1 (en) Server management system supporting multi-vendor
CN116489001A (en) Switch fault diagnosis and recovery method and device, switch and storage medium
JP7183841B2 (en) electronic controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination