CN102467438A - Method for obtaining fault signal of storage device by baseboard management controller - Google Patents

Method for obtaining fault signal of storage device by baseboard management controller Download PDF

Info

Publication number
CN102467438A
CN102467438A CN2010105467572A CN201010546757A CN102467438A CN 102467438 A CN102467438 A CN 102467438A CN 2010105467572 A CN2010105467572 A CN 2010105467572A CN 201010546757 A CN201010546757 A CN 201010546757A CN 102467438 A CN102467438 A CN 102467438A
Authority
CN
China
Prior art keywords
storage device
management controller
baseboard management
bmc
health
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010105467572A
Other languages
Chinese (zh)
Inventor
陈志伟
卢晓芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CN2010105467572A priority Critical patent/CN102467438A/en
Publication of CN102467438A publication Critical patent/CN102467438A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for obtaining a fault signal of a storage device by a baseboard management controller, comprising the following steps: building a corresponding detector data record SDR of the storage device and the platform event selection PEF of the storage device in the memory block of the BMC; starting a self-monitoring system of the starting storage device, and ordering the self-monitoring system to detect the health detection data of the storage device; writing the health critical value corresponding to the health detection data into the memory block of the BMC; providing updating instructs such that the self-monitoring system writes the health detection data into the storage device SDR by updating the instructs.

Description

Utilize baseboard management controller to obtain the method for storage device fault-signal
Technical field
The present invention relates to a kind of method that obtains the storage device fault-signal, particularly a kind ofly utilize baseboard management controller (Baseboard Management Controller BMC) obtains the method for storage device fault-signal.
Background technology
Along with the fast development with network technology of popularizing of computing machine, the service that only can be provided by common computer or equipment is not apply use, so develop the technology that server.Server is a computer platform of being good at handling network technology for a kind of, and it can be linked to variety of network systems, and to the computing machine that is connected through network system various application services is provided.Server has jumbo storage device mostly, to provide such as multimedia, network hard disc or enterprise with services such as databases.Hence one can see that, and storage device is a considerable assembly in the server, will cause serious harmful effect to server and even the service that offers the client in case break down.
And for management server, intelligent platform management interface (Intelligent Platform ManagementInterface, arise at the historic moment by technology IPMI).The supvr can and be disposed at baseboard management controller (Baseboard Management Controller, the BMC) monitoring server in the server through IPMI.But present server is to send fault-signal through the independent hardware that operates to light the cresset on the server again, and can notify the keeper after the storage device fault.That is to say that the existing fault signal is directly by the hardware encoded control.Therefore cause existing server can't integrate parallel fault-signal and administrative mechanism, also can't notify the problem of keeper's event of failure efficiently.
Summary of the invention
In order to address the above problem, to the object of the present invention is to provide and a kind ofly utilize baseboard management controller (Baseboard Management Controller BMC) obtains the method for storage device fault-signal.The method of utilizing BMC to obtain the storage device fault-signal is applicable to the server with a BMC and a storage device.The method of utilizing baseboard management controller to obtain the storage device fault-signal comprises: in the memory region of BMC, set up corresponding storage device detector data record (sensor data record; SDR) and storage device platform incident screening (platform event filter, PEF); (Self-Monitoring Analysis and Reporting Technology S.M.A.R.T.), and makes S.M.A.R.T. regularly detect health detection data of storage device in one self-monitoring system of startup storage device; To write the memory region of BMC corresponding at least one healthy critical value of health detection data; And a update instruction is provided, make S.M.A.R.T. regularly the health detection data write storage device SDR through update instruction.
Wherein the health detection data can comprise at least one fitness programme, and the respectively corresponding healthy critical value of each fitness programme.Fitness programme can be bad rail number (uncorrectable sector count), present temperature (temperature) or present rotating speed (speed).Fitness programme also can be that read error rate (readerror rate), rotation try number (spin retry count) again or handle track number (current pendingsector count) at present.
(Intelligent PlatformManagement Interface, threshold value IPMI) are provided with instruction (set sensor thresholdcommand) will write BMC corresponding to the healthy critical value of health detection data and S.M.A.R.T. can pass through an intelligent platform management interface.
According to an enforcement example, the method for utilizing BMC to obtain the storage device fault-signal also can comprise: carry out a storage device supervisory routine according to storage device SDR, healthy critical value and storage device PEF.
Wherein the storage device supervisory routine can comprise: notice is through an Intelligent Platform Management Bus (IntelligentPlatform Management Bus, a remote manager that IPMB) links to each other with BMC.The storage device supervisory routine also can comprise: at least one storage element that suspends storage device according to storage device SDR, healthy critical value.The storage device supervisory routine also can comprise: light a light emitting diode (light emitting diode, LED) group corresponding to storage device according to storage device SDR, healthy critical value.
And storage device can comprise a plurality of storage elements, and light-emitting diode group then comprises a plurality of light emitting diode cressets that correspond respectively to these storage elements.And the storage device supervisory routine can be lighted at least one light emitting diode cresset according to health detection data, healthy critical value.
In sum, the method for utilizing BMC to obtain the storage device fault-signal utilizes storage device SDR and S.M.A.R.T. to obtain the present health status of storage device.And can light corresponding LED group at BMC, also notify simultaneously long-range keeper.Therefore be integrated among the incident of BMC management by the disk failure mechanism of lighting a lamp of hardware controls, make management interface be able to unified and promote the efficiency of management.
Describe the present invention below in conjunction with accompanying drawing and specific embodiment, but not as to qualification of the present invention.
Description of drawings
Fig. 1 is the synoptic diagram of the server of an enforcement example;
Fig. 2 is that the baseboard management controller that utilizes of an enforcement example is obtained the process flow diagram of the method for storage device fault-signal;
Fig. 3 obtains the process flow diagram of the method for storage device fault-signal for another baseboard management controller that utilizes of implementing example;
Fig. 4 implements the synoptic diagram of the server of example for another.
Wherein, Reference numeral
20 servers
21 baseboard management controllers
210 memory regions
212 storage device SDR
214 storage device PEF
22 storage devices
222,222a, 222b, 222c storage element
23 self-monitoring systems
24 central processing units
25 light-emitting diode group
252,252a, 252b, 252c light emitting diode cresset
30 remote calculators
32 remote manager
Embodiment
Below in embodiment, be described in detail detailed features of the present invention and advantage; Its content is enough to make any those skilled in the art to understand technology contents of the present invention and implements according to this; And according to content, the claim scope and graphic that this instructions disclosed, any those skilled in the art can understand purpose and the advantage that the present invention is correlated with easily.
The invention relates to and a kind ofly utilize baseboard management controller (Baseboard ManagementController BMC) obtains the method for storage device fault-signal, and it is applicable to the server with a baseboard management controller (BMC) and a storage device.
Please with reference to Fig. 1, it is the synoptic diagram of the server of an enforcement example.Server 20 comprises BMC 21, storage device 22, (the Self-Monitoring Analysis and ReportingTechnology of a self-monitoring system; S.M.A.R.T.) 23 and one central processing unit (central processing unit; CPU) 24, wherein central processing unit 24 is electrical connected with storage device 22 and S.M.A.R.T.23.Storage device 22 can for example be various jumbo hard disks, or disk array (redundant array ofinexpensive disk, RAID) system.Server 20 also can link to each other with a remote calculator (remotecomputer) 30 through network, and remote calculator 30 then can pass through a remote manager 32 and BMC 21 management servers 20.
Server 20 can support that (Intelligent Platform ManagementInterface IPMI), and moves an operating system through above-mentioned hardware to intelligent platform management interface.Wherein server 20 can use the operating systems such as Windows (Windows) Server 2003 of Linux, FreeBSD or the Microsoft (Microsoft) of Unix; Also can be disc operating system (DOS) (Disk Operating System; DOS) or extensible firmware interface (Extensible Firmware Interface; Extensible Firmware Interface, system EFI).And server 20 also can various labels various server products, the present invention is not to its restriction.
In more detail; Intelligent platform management interface is a kind of standard architecture of server admin platform; It comprises BMC 21, system interface (System Interface), non-volatile memory cells (Non-volatileStorage), Intelligent Platform Management Bus (Intelligent Platform Management Bus; IPMB) and intelligent shelf management bus (Intelligent Chassis Management Bus ICMB) waits 5 assemblies.And wherein most important be exactly BMC 21.BMC 21 similarly is the computing machine of a platform independent, comprises the resource such as processor and memory body of oneself.And the resource that oneself has is all used in the running of BMC 21, and can not take other resource of the hardware module of server 20.For example, remote calculator 30 can use the iLO system of company of Hewlett-Packard (HP), the iDRAC system of company of Dell (DELL), or the ESB2 system of Intel (Intel) company.
S.M.A.R.T.23 system is a kind of hard disk OBD detection technique of IBM Corporation's exploitation, and this technology is adopted by each big factory of tame computing machine hardware manufacturing and the big factory of hard disk.Therefore present most hard disk or disk array all have the function of supporting S.M.A.R.T.23.In simple terms, S.M.A.R.T.23 is in order to the system of monitoring storage device 22, and it can detect the health status of storage device 22 and repay.S.M.A.R.T.23 can to the present temperature of storage device 22 or various projects such as rotating speed regularly detect at present.
Please cooperate Fig. 1 and with reference to Fig. 2, Fig. 2 be one implement example the BMC that utilizes obtain the process flow diagram of the method for storage device fault-signal.At first in the memory region 210 of BMC 21, set up corresponding storage device detector data record (sensor data record, SDR) screening of 212 and one storage device platform incident (platform event filter, PEF) 214 (step S100).Wherein storage device PEF 214 can have at least one storage device supervisory routine, so that BMC to be provided the foundation of 21 management storage devices 22.
Behind the storage device SDR 212 and storage device PEF 214 foundation corresponding to storage device 22, start the S.M.A.R.T.23 of storage device 22, and make S.M.A.R.T.23 regularly detect health detection data (step S110) of storage device 22.Then will write the memory region 210 (step S120) of BMC 21 corresponding at least one healthy critical value of health examination data.Therefore the memory region 210 of BMC 21 has in correspondence with each other at least one group of storage device SDR 212, storage device PEF 214 and healthy critical value.
Wherein the health detection data can comprise at least one fitness programme, and the respectively corresponding healthy critical value of each fitness programme.Fitness programme for example can be bad rail number (uncorrectable sector count), present temperature (temperature) or present rotating speed (speed).Fitness programme also can be that read error rate (readerror rate), rotation try number (spin retry count) again or handle track number (current pendingsector count) at present.And S.M.A.R.T.23 can be provided with instruction (set sensorthreshold command) through the threshold value of IPMI, will write BMC 21 corresponding to the healthy critical value of health detection data.
BMC 21 can provide a update instruction, makes S.M.A.R.T.23 regularly up-to-date health detection data write storage device SDR 212 (step S130) through update instruction.That is to say that BMC 21 can obtain the present health status of storage device 22 through storage device SDR 212 and S.M.A.R.T.23, and need not set up extra detecting device to storage device 22.
Please with reference to Fig. 3, it obtains the process flow diagram of the method for storage device fault-signal for another BMC that utilizes that implements example.The method of utilizing BMC to obtain the storage device fault-signal can also be carried out storage device supervisory routine (step S140) according to storage device SDR 212, healthy critical value and storage device PEF 214.Whether BMC 21 regularly reads at least one fitness programme, and compares present value of this fitness programme and corresponding healthy critical value, occur unusually to judge storage device 22.For example when the present temperature of storage device 22 or bad rail number were higher than corresponding healthy critical value, BMC 21 can assert storage device 22 faults.BMC 21 can the incident of this fault be write a system event log file (system event log, SEL) among, and in storage device PEF 214, look for suitable storage device supervisory routine according to the content of SEL and carry out.
Wherein the storage device supervisory routine can comprise: the remote manager 32 that notice links to each other with BMC 21 through IPMB.And when storage device 22 failure situations were serious, the storage device supervisory routine also can comprise: at least one storage element that suspends storage device 22 according to storage device SDR 212, healthy critical value.In addition, the storage device supervisory routine also can comprise: light a light emitting diode (light emitting diode, LED) group 25 corresponding to storage device 22 according to storage device SDR 212, healthy critical value.That is to say that the function of lighting LED group 25 and notice remote manager 32 is integrated into by BMC 21 to be carried out.
Please cooperate with reference to Fig. 4, it implements the synoptic diagram of the server of example for another.Storage device 22 can comprise a plurality of storage elements 222, for example storage element 222a, storage element 222b and storage element 222c; LED group 25 then can comprise a plurality of LED cressets 252 identical with storage element 222 quantity, for example LED cresset 252a, LED cresset 252b and LED cresset 252.BMC 21 can learn that according to storage device SDR 212 and healthy critical value what break down is which storage element 222 in the storage device 22, lights the corresponding LED cresset 252 of storage element 222 of fault more according to this.Thus, can learn the failure condition of storage device 22 like a cork so that come to check the keeper of server 20.
In sum, the method for utilizing BMC to obtain the storage device fault-signal provides in order to the update instruction of upgrading storage device SDR to S.M.A.R.T., and this obtains the present health status of storage device.And after detecting unusually, the storage device supervisory routine not only can be lighted corresponding LED group, also can notify long-range keeper.That is to say that original independent disk failure mechanism of lighting a lamp by hardware controls is integrated among the incident of BMC management, it is unified that management interface is able to.Thus, can solve prior art as the bull carriage the mixed and disorderly way to manage as parallel, and can be, and notify the keeper when breaking down incident efficiently again with more succinct and efficient management by methods server.
Certainly; The present invention also can have other various embodiments; Under the situation that does not deviate from spirit of the present invention and essence thereof; Those of ordinary skill in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims (10)

1. method of utilizing baseboard management controller BMC to obtain the storage device fault-signal; Be applicable to a server with a baseboard management controller BMC and a storage device; It is characterized in that this method of utilizing baseboard management controller to obtain the storage device fault-signal comprises:
In the memory region of this BMC, set up corresponding storage device detector data record SDR and storage device platform incident screening PEF;
Start a self-monitoring system of this storage device, and make this self-monitoring system regularly detect health detection data of this storage device;
To write this memory region of this BMC corresponding at least one healthy critical value of these health detection data; And
One update instruction is provided, makes this self-monitoring system regularly these health detection data write this storage device SDR through this update instruction.
2. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 1 is characterized in that, these health detection data comprise at least one fitness programme, and each this of fitness programme difference correspondence should the health critical value.
3. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 2 is characterized in that, this fitness programme is bad rail number, present temperature or present rotating speed.
4. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 3 is characterized in that, this fitness programme is that read error rate, rotation try number again or handle track number at present.
5. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 1; It is characterized in that the threshold value of this self-monitoring system through an intelligent platform management interface is provided with instruction and will writes this BMC corresponding to this health critical value of these health detection data.
6. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 1 is characterized in that, also comprises:
Carry out a storage device supervisory routine according to this storage device SDR, this health critical value and this storage device PEF.
7. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 6 is characterized in that, this storage device supervisory routine comprises:
The remote manager that notice links to each other with this BMC through an Intelligent Platform Management Bus.
8. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 6 is characterized in that, this storage device supervisory routine comprises:
Suspend at least one storage element of this storage device according to this storage device SDR, this health critical value.
9. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 6 is characterized in that, this storage device supervisory routine comprises:
Light a light-emitting diode group according to this storage device SDR, this health critical value corresponding to this storage device.
10. the method for utilizing baseboard management controller to obtain the storage device fault-signal according to claim 9; It is characterized in that; This storage device comprises a plurality of storage elements; This light-emitting diode group comprises a plurality of light emitting diode cressets that correspond respectively to those storage elements, and this storage device supervisory routine is lighted at least one this light emitting diode cresset according to these health detection data, this health critical value.
CN2010105467572A 2010-11-12 2010-11-12 Method for obtaining fault signal of storage device by baseboard management controller Pending CN102467438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105467572A CN102467438A (en) 2010-11-12 2010-11-12 Method for obtaining fault signal of storage device by baseboard management controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105467572A CN102467438A (en) 2010-11-12 2010-11-12 Method for obtaining fault signal of storage device by baseboard management controller

Publications (1)

Publication Number Publication Date
CN102467438A true CN102467438A (en) 2012-05-23

Family

ID=46071102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105467572A Pending CN102467438A (en) 2010-11-12 2010-11-12 Method for obtaining fault signal of storage device by baseboard management controller

Country Status (1)

Country Link
CN (1) CN102467438A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105938450A (en) * 2015-03-06 2016-09-14 广达电脑股份有限公司 Automatic debug information collection method and system
CN106294065A (en) * 2016-07-28 2017-01-04 联想(北京)有限公司 Hard disk failure monitoring method, Apparatus and system
CN107239385A (en) * 2017-06-06 2017-10-10 郑州云海信息技术有限公司 A kind of server and instruction lamp control method
CN109981417A (en) * 2019-04-08 2019-07-05 苏州浪潮智能科技有限公司 A kind of test method and device of server state monitoring stability

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007109238A (en) * 2005-10-14 2007-04-26 Dell Products Lp System and method for logging recoverable error
CN101135988A (en) * 2006-08-15 2008-03-05 环达电脑(上海)有限公司 Remote monitor module for computer initialization
US20080065928A1 (en) * 2006-09-08 2008-03-13 International Business Machines Corporation Technique for supporting finding of location of cause of failure occurrence
CN101231612A (en) * 2007-01-25 2008-07-30 宏正自动科技股份有限公司 Intelligent platform supervision interface system and method
CN101741600A (en) * 2008-11-27 2010-06-16 英业达股份有限公司 Server system, recording equipment and management method thereof
TW201033805A (en) * 2009-03-13 2010-09-16 Giga Byte Tech Co Ltd Apparatus and method for monitoring server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007109238A (en) * 2005-10-14 2007-04-26 Dell Products Lp System and method for logging recoverable error
CN101135988A (en) * 2006-08-15 2008-03-05 环达电脑(上海)有限公司 Remote monitor module for computer initialization
US20080065928A1 (en) * 2006-09-08 2008-03-13 International Business Machines Corporation Technique for supporting finding of location of cause of failure occurrence
CN101231612A (en) * 2007-01-25 2008-07-30 宏正自动科技股份有限公司 Intelligent platform supervision interface system and method
CN101741600A (en) * 2008-11-27 2010-06-16 英业达股份有限公司 Server system, recording equipment and management method thereof
TW201033805A (en) * 2009-03-13 2010-09-16 Giga Byte Tech Co Ltd Apparatus and method for monitoring server

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105938450A (en) * 2015-03-06 2016-09-14 广达电脑股份有限公司 Automatic debug information collection method and system
CN105938450B (en) * 2015-03-06 2019-04-12 广达电脑股份有限公司 The method and system that automatic debugging information is collected
CN106294065A (en) * 2016-07-28 2017-01-04 联想(北京)有限公司 Hard disk failure monitoring method, Apparatus and system
CN107239385A (en) * 2017-06-06 2017-10-10 郑州云海信息技术有限公司 A kind of server and instruction lamp control method
CN109981417A (en) * 2019-04-08 2019-07-05 苏州浪潮智能科技有限公司 A kind of test method and device of server state monitoring stability

Similar Documents

Publication Publication Date Title
US8676568B2 (en) Information processing apparatus and message extraction method
Zheng et al. Co-analysis of RAS log and job log on Blue Gene/P
US20120110389A1 (en) Method for obtaining storage device state signal by using bmc
JP5925803B2 (en) Predict, diagnose, and recover from application failures based on resource access patterns
US20150052402A1 (en) Cloud Deployment Infrastructure Validation Engine
US20070088988A1 (en) System and method for logging recoverable errors
US10831785B2 (en) Identifying security breaches from clustering properties
US7210071B2 (en) Fault tracing in systems with virtualization layers
US20080256400A1 (en) System and Method for Information Handling System Error Handling
CN102467425A (en) Method for acquiring storage device failure signal by utilizing baseboard management controller
CN101853175A (en) The self check that promotes in the virtualized environment
US8839017B2 (en) Electronic device and method for detecting power statuses of electronic device
US9529674B2 (en) Storage device management of unrecoverable logical block addresses for RAID data regeneration
US20080244302A1 (en) System and method to enable an event timer in a multiple event timer operating environment
CN110188013A (en) A kind of log read-write capability test method, device and electronic equipment and storage medium
CN102467438A (en) Method for obtaining fault signal of storage device by baseboard management controller
WO2017220013A1 (en) Service processing method and apparatus, and storage medium
CN102467434A (en) Method for acquiring storage device state signal by utilizing baseboard management controller
US6724315B2 (en) Identification of mounting locations of sub-systems in mounting units
US7546489B2 (en) Real time event logging and analysis in a software system
US11552840B2 (en) Intention-based device component tracking system
US20230023869A1 (en) System and method for providing intelligent assistance using a warranty bot
US8689059B2 (en) System and method for handling system failure
CN111625185A (en) Method, system and related assembly for monitoring disk fault
CN105138283A (en) Computer disk and data management method and apparatus used for computer disk

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120523