CN116166495A - An out-of-band hard disk failure prediction system, method, device and readable storage medium - Google Patents
An out-of-band hard disk failure prediction system, method, device and readable storage medium Download PDFInfo
- Publication number
- CN116166495A CN116166495A CN202211574304.XA CN202211574304A CN116166495A CN 116166495 A CN116166495 A CN 116166495A CN 202211574304 A CN202211574304 A CN 202211574304A CN 116166495 A CN116166495 A CN 116166495A
- Authority
- CN
- China
- Prior art keywords
- hard disk
- data
- bmc
- smart
- band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000003993 interaction Effects 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 21
- 241001290266 Sciaenops ocellatus Species 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000004220 aggregation Methods 0.000 claims 2
- 230000002776 aggregation Effects 0.000 claims 2
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3037—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
- G06F11/3093—Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Abstract
本发明提出的一种带外硬盘故障预测系统、方法、装置及可读存储介质,所述系统包括PC端、BMC、扩展卡和硬盘背板,硬盘背板上连接有多个硬盘;PC端与BMC通过网络数据连接,扩展卡分别与BMC和硬盘背板数据连接;PC端通过预设访问方式访问BMC,并获取硬盘的原始SMART数据以及BMC处理过的故障预警数据;BMC用于访问扩展卡,通过轮询的方式从扩展卡获取硬盘的原始SMART数据;扩展卡用于读取硬盘的SMART Data数据和SMART Threshold数据,并在BMC轮询时将读取的数据按照交互规则反馈给BMC。本发明采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,并实现了硬盘故障预测预警。
A kind of out-of-band hard disk fault prediction system, method, device and readable storage medium that the present invention proposes, described system comprises PC end, BMC, expansion card and hard disk backboard, and a plurality of hard disks are connected on hard disk backboard; PC end It is connected to the BMC through the network data, and the expansion card is respectively connected to the BMC and the hard disk backplane data; the PC side accesses the BMC through the preset access method, and obtains the original SMART data of the hard disk and the fault warning data processed by the BMC; the BMC is used to access the expansion The card obtains the original SMART data of the hard disk from the expansion card by polling; the expansion card is used to read the SMART Data data and SMART Threshold data of the hard disk, and feeds the read data back to the BMC according to the interaction rules when the BMC polls . The invention obtains the SMART information of the hard disk in an out-of-band manner based on the SAS Expander, and realizes hard disk failure prediction and early warning.
Description
技术领域Technical Field
本发明涉及计算机技术领域,更具体的说是涉及一种带外硬盘故障预测系统、方法、装置及可读存储介质。The present invention relates to the field of computer technology, and more particularly to an out-of-band hard disk failure prediction system, method, device and readable storage medium.
背景技术Background Art
服务器是计算机的一种,它比普通计算机运行更快、负载更高、价格更贵。服务器在网络中为其它客户机(如PC机、智能手机、ATM等终端甚至是火车系统等大型设备)提供计算或者应用服务。服务器具有高速的CPU运算能力、长时间的可靠运行、强大的I/O外部数据吞吐能力以及更好的扩展性。A server is a type of computer that runs faster, has a higher load, and is more expensive than an ordinary computer. A server provides computing or application services to other clients (such as PCs, smartphones, ATMs, and even large devices such as train systems) in the network. A server has high-speed CPU computing power, long-term reliable operation, powerful I/O external data throughput, and better scalability.
服务器按用途可划分为通用型服务器和专用型服务器两类。通用型服务器是没有为某种特殊服务专门设计的、可以提供各种服务功能的服务器,当前大多数服务器是通用型服务器。专用型(或称“功能型”)服务器是专门为某一种或某几种功能专门设计的服务器。在某些方面与通用型服务器不同。比如存储型服务器,此类服务器是属于会携带大量的特殊服务,包括存储管理软件、保证高灵活性的额外硬件、RAID配置类型,以及确保更多桌面使用者与之连接的额外网络连接等功能的服务器,不但性能的要求与通用性服务器不同,而且对存储空间有特殊的要求。Servers can be divided into two categories according to their use: general-purpose servers and dedicated servers. General-purpose servers are servers that are not specially designed for a particular service and can provide a variety of service functions. Most of the current servers are general-purpose servers. Dedicated (or "functional") servers are servers that are specially designed for one or several functions. They are different from general-purpose servers in some aspects. For example, storage servers are servers that carry a large number of special services, including storage management software, additional hardware to ensure high flexibility, RAID configuration types, and additional network connections to ensure that more desktop users are connected to them. Not only do the performance requirements differ from those of general-purpose servers, but they also have special requirements for storage space.
当前,因CPU直出的硬盘数目总量限制,在存储型服务器中使用SAS Expander扩展大量的高容量机械硬盘,从而实现海量存储。但是,硬盘一旦损坏,需要进行数据恢复,严重时可能造成数据丢失。因此,如何对企业级硬盘进行故障预警和可靠性评估,进而为企业的存储运营维护提供有效指导,防止数据损坏或者丢失,是非常有意义的。Currently, due to the total number of hard disks directly output from the CPU, SAS Expander is used in storage servers to expand a large number of high-capacity mechanical hard disks to achieve massive storage. However, once the hard disk is damaged, data recovery is required, which may cause data loss in serious cases. Therefore, it is very meaningful to conduct fault warning and reliability assessment of enterprise-level hard disks, and then provide effective guidance for enterprise storage operation and maintenance to prevent data damage or loss.
发明内容Summary of the invention
针对以上问题,本发明的目的在于提供一种带外硬盘故障预测系统、方法、装置及可读存储介质,采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,并实现了硬盘故障预测预警。In view of the above problems, the purpose of the present invention is to provide an out-of-band hard disk fault prediction system, method, device and readable storage medium, which obtains the SMART information of the hard disk in an out-of-band manner based on SAS Expander and realizes hard disk fault prediction and warning.
本发明为实现上述目的,通过以下技术方案实现:一种带外硬盘故障预测系统,包括:PC端、BMC、扩展卡和硬盘背板,硬盘背板上连接有多个硬盘;PC端与BMC通过网络数据连接,扩展卡分别与BMC和硬盘背板数据连接;PC端通过预设访问方式访问BMC,并获取硬盘的原始SMART数据以及BMC处理过的故障预警数据;BMC用于访问扩展卡,通过轮询的方式从扩展卡获取硬盘的原始SMART数据;扩展卡用于读取硬盘的SMART Data数据和SMARTThreshold数据,并在BMC轮询时将读取的数据按照交互规则反馈给BMC。To achieve the above-mentioned purpose, the present invention is implemented through the following technical solutions: an out-of-band hard disk fault prediction system, comprising: a PC end, a BMC, an expansion card and a hard disk backplane, wherein a plurality of hard disks are connected to the hard disk backplane; the PC end is connected to the BMC through network data, and the expansion card is respectively connected to the BMC and the hard disk backplane data; the PC end accesses the BMC through a preset access method, and obtains the original SMART data of the hard disk and the fault warning data processed by the BMC; the BMC is used to access the expansion card, and obtains the original SMART data of the hard disk from the expansion card through polling; the expansion card is used to read the SMART Data data and SMARTThreshold data of the hard disk, and feeds back the read data to the BMC according to the interaction rule when the BMC polls.
进一步,所述预设访问方式包括:ipmitool、redfish和web界面。Furthermore, the preset access modes include: ipmitool, redfish and web interface.
进一步,BMC通过I2C总线与扩展卡数据连接,扩展卡通过SATA接口与硬盘背板数据连接。Furthermore, the BMC is connected to the expansion card data via the I2C bus, and the expansion card is connected to the hard disk backplane data via the SATA interface.
进一步,硬盘背板上连接有12个硬盘,每个硬盘的SMART数据包括不超过30组的属性和相应的属性值。Furthermore, 12 hard disks are connected to the hard disk backplane, and the SMART data of each hard disk includes no more than 30 groups of attributes and corresponding attribute values.
相应的,本发明还公开了一种带外硬盘故障预测方法,包括:BMC向扩展卡发出信息获取请求;Correspondingly, the present invention also discloses an out-of-band hard disk fault prediction method, comprising: the BMC sends an information acquisition request to the expansion card;
扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC;After receiving the request, the expansion card polls the original SMART data of all hard disks in turn, and feeds the aggregated SMART data back to the BMC;
BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈。The BMC cooperates with the PC to predict hard disk failures based on SMART data and provide data feedback through a preset transmission method.
进一步,所述扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC,包括:扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将原始SMART数据汇总在硬盘原始数据表中,并将硬盘原始数据表发送至BMC;其中,硬盘原始数据表包括硬盘ID信息和对应的SMARTID信息。Further, after receiving the request, the expansion card sequentially polls the original SMART data of all hard disks, and feeds back the aggregated SMART data to the BMC, including: after receiving the request, the expansion card sequentially polls the original SMART data of all hard disks, aggregates the original SMART data in a hard disk raw data table, and sends the hard disk raw data table to the BMC; wherein the hard disk raw data table includes hard disk ID information and corresponding SMARTID information.
进一步,所述BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈,包括:BMC获取到SMART数据后,在BMC端启动预置的硬盘故障预测算法进行硬盘故障预警分析处理,将分析结果通过ipmitool、redfish或者web方式发送至PC端。Furthermore, the BMC cooperates with the PC to perform hard disk failure prediction processing according to SMART data, and feeds back the data through a preset transmission method, including: after the BMC obtains the SMART data, the preset hard disk failure prediction algorithm is started on the BMC to perform hard disk failure warning analysis and processing, and the analysis results are sent to the PC through ipmitool, redfish or web.
进一步,所述BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈,包括:Furthermore, the BMC cooperates with the PC to perform hard disk failure prediction processing according to SMART data, and feeds back the data through a preset transmission method, including:
BMC获取到SMART数据后,将SMART数据通过ipmitool、redfish或者web方式发送至PC端;PC端接收到数据后,使用客制化硬盘故障预测处理工具对SMART数据进进行硬盘故障预警分析处理,并显示分析结果。After BMC obtains SMART data, it sends the SMART data to the PC through ipmitool, redfish or web. After the PC receives the data, it uses a customized hard disk failure prediction and processing tool to perform hard disk failure warning analysis on the SMART data and display the analysis results.
相应的,本发明公开了一种带外硬盘故障预测装置,包括:Accordingly, the present invention discloses an out-of-band hard disk failure prediction device, comprising:
存储器,用于存储带外硬盘故障预测程序;A memory for storing an out-of-band hard disk failure prediction program;
处理器,用于执行所述带外硬盘故障预测程序时实现如上文任一项所述带外硬盘故障预测方法的步骤。A processor is used to implement the steps of the out-of-band hard disk failure prediction method as described in any one of the above items when executing the out-of-band hard disk failure prediction program.
相应的,本发明公开了一种可读存储介质,所述可读存储介质上存储有带外硬盘故障预测程序,所述带外硬盘故障预测程序被处理器执行时实现如上文任一项所述带外硬盘故障预测方法的步骤。Accordingly, the present invention discloses a readable storage medium, on which an out-of-band hard disk fault prediction program is stored. When the out-of-band hard disk fault prediction program is executed by a processor, the steps of the out-of-band hard disk fault prediction method as described in any of the above items are implemented.
对比现有技术,本发明有益效果在于:本发明公开了一种带外硬盘故障预测系统、方法、装置及可读存储介质,采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,不需要在系统OS下安装软件,能够在系统用户无感知的情况下,实现硬盘故障预测预警,并便于服务器硬盘故障预测监控的集中管理的实现。Compared with the prior art, the beneficial effect of the present invention lies in: the present invention discloses an out-of-band hard disk fault prediction system, method, device and readable storage medium, which obtains the SMART information of the hard disk in an out-of-band manner based on SAS Expander, does not need to install software under the system OS, can realize hard disk fault prediction and early warning without the system user's perception, and facilitates the realization of centralized management of server hard disk fault prediction monitoring.
由此可见,本发明与现有技术相比,具有突出的实质性特点和显著的进步,其实施的有益效果也是显而易见的。It can be seen that compared with the prior art, the present invention has outstanding substantive features and significant progress, and the beneficial effects of its implementation are also obvious.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on the provided drawings without paying creative work.
图1是本发明具体实施方式的系统结构图。FIG. 1 is a system structure diagram of a specific implementation mode of the present invention.
图2是本发明具体实施方式的方法流程图。FIG. 2 is a method flow chart of a specific embodiment of the present invention.
图3是本发明具体实施方式的流程示意图。FIG3 is a schematic diagram of a flow chart of a specific embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
本发明的核心是提供一种带外硬盘故障预测方法,现有技术中,因CPU直出的硬盘数目总量限制,在存储型服务器中使用SAS Expander扩展大量的高容量机械硬盘,从而实现海量存储。但是,硬盘一旦损坏,需要进行数据恢复,严重时可能造成数据丢失,现有技术无法实现对企业级硬盘进行故障预警和可靠性评估。The core of the present invention is to provide an out-of-band hard disk failure prediction method. In the prior art, due to the total number of hard disks directly output from the CPU, SAS Expander is used in storage servers to expand a large number of high-capacity mechanical hard disks to achieve mass storage. However, once the hard disk is damaged, data recovery is required, which may cause data loss in severe cases. The prior art cannot achieve fault warning and reliability assessment of enterprise-level hard disks.
而本发明提供的带外硬盘故障预测方法,首先,BMC向扩展卡发出信息获取请求。然后,扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC。最后,BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈。由此可见,本发明采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,并实现了硬盘故障预测预警。The out-of-band hard disk fault prediction method provided by the present invention is as follows: first, the BMC sends an information acquisition request to the expansion card. Then, after receiving the request, the expansion card sequentially polls the original SMART data of all hard disks, and feeds back the aggregated SMART data to the BMC. Finally, the BMC cooperates with the PC to perform hard disk fault prediction processing based on the SMART data, and feeds back the data through a preset transmission method. It can be seen that the present invention uses an out-of-band method based on SAS Expander to obtain the SMART information of the hard disk, and realizes hard disk fault prediction and early warning.
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the present invention is further described in detail below in conjunction with the accompanying drawings and specific implementation methods. Obviously, the described embodiments are only part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present invention.
实施例一:Embodiment 1:
如图1所示,本实施例提供了一种带外硬盘故障预测系统,包括:PC端、BMC、扩展卡和硬盘背板,硬盘背板上连接有多个硬盘。As shown in FIG1 , this embodiment provides an out-of-band hard disk fault prediction system, including: a PC end, a BMC, an expansion card, and a hard disk backplane, and a plurality of hard disks are connected to the hard disk backplane.
PC端与BMC通过网络数据连接,扩展卡分别与BMC和硬盘背板数据连接;PC端通过预设访问方式访问BMC,并获取硬盘的原始SMART数据以及BMC处理过的故障预警数据。BMC用于访问扩展卡,通过轮询的方式从扩展卡获取硬盘的原始SMART数据。扩展卡用于读取硬盘的SMART Data数据和SMART Threshold数据,并在BMC轮询时将读取的数据按照交互规则反馈给BMC。The PC is connected to the BMC through network data, and the expansion card is connected to the BMC and the hard disk backplane data respectively; the PC accesses the BMC through the preset access method and obtains the original SMART data of the hard disk and the fault warning data processed by the BMC. The BMC is used to access the expansion card and obtain the original SMART data of the hard disk from the expansion card through polling. The expansion card is used to read the SMART Data and SMART Threshold data of the hard disk, and feeds the read data back to the BMC according to the interaction rules when the BMC polls.
其中,预设访问方式包括:ipmitool、redfish和web界面。Among them, the preset access methods include: ipmitool, redfish and web interface.
作为示例的,BMC通过I2C总线与扩展卡数据连接,扩展卡通过SATA接口与硬盘背板数据连接。硬盘背板上连接有12个硬盘,每个硬盘的SMART数据包括不超过30组的属性和相应的属性值。As an example, the BMC is connected to the expansion card data through the I2C bus, and the expansion card is connected to the hard disk backplane data through the SATA interface. There are 12 hard disks connected to the hard disk backplane, and the SMART data of each hard disk includes no more than 30 groups of attributes and corresponding attribute values.
需要特别说明的是,在本系统中,PC端通过网络远程访问BMC,并获取硬盘的原始SMART数据以及BMC处理过的故障预警数据,访问方式可能是ipmitool、redfish或者web界面。客户可以直接使用BMC的硬盘故障预警信息(一种标识健康状态的数值信息),或者通过原始SMART数据进行自定义的故障预警处理。BMC通过I2C总线访问扩展卡(Expander),通过轮询的方式从Expander获取硬盘的原始SMART数据。具体的,依次轮询每块硬盘的SMART数据,每个硬盘的SMART数据根据规范有不超过30组属性和属性值组成。Expander通过SATA接口,读取硬盘原始的SMART Data数据、SMART Threshold数据,这些数据是用于硬盘故障预测所必须的数据。Expander在BMC轮询时,会将这些数据按照交互规范反馈给BMC。It should be noted that in this system, the PC remotely accesses the BMC through the network and obtains the original SMART data of the hard disk and the fault warning data processed by the BMC. The access method may be ipmitool, redfish or web interface. Customers can directly use the hard disk fault warning information of the BMC (a numerical information that identifies the health status), or perform customized fault warning processing through the original SMART data. The BMC accesses the expansion card (Expander) through the I2C bus and obtains the original SMART data of the hard disk from the Expander through polling. Specifically, the SMART data of each hard disk is polled in turn. According to the specification, the SMART data of each hard disk consists of no more than 30 groups of attributes and attribute values. Expander reads the original SMART Data and SMART Threshold data of the hard disk through the SATA interface. These data are necessary for hard disk fault prediction. When the BMC polls, the Expander will feed back these data to the BMC according to the interaction specification.
本实施例提供了一种带外硬盘故障预测系统,采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,不需要在系统OS下安装软件,能够在系统用户无感知的情况下,实现硬盘故障预测预警,并便于服务器硬盘故障预测监控的集中管理的实现。This embodiment provides an out-of-band hard disk fault prediction system, which obtains the SMART information of the hard disk in an out-of-band manner based on SAS Expander. It does not need to install software under the system OS, can implement hard disk fault prediction and early warning without the system user's awareness, and facilitates the implementation of centralized management of server hard disk fault prediction monitoring.
实施例二:Embodiment 2:
基于实施例一,如图2所示,本发明还公开了一种带外硬盘故障预测方法,包括如下步骤:Based on the first embodiment, as shown in FIG2 , the present invention further discloses an out-of-band hard disk failure prediction method, comprising the following steps:
S1:BMC向扩展卡发出信息获取请求。S1: BMC sends an information acquisition request to the expansion card.
S2:扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC。S2: After receiving the request, the expansion card polls the original SMART data of all hard disks in turn, and feeds back the aggregated SMART data to the BMC.
具体的,扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将原始SMART数据汇总在硬盘原始数据表中,并将硬盘原始数据表发送至BMC。其中,硬盘原始数据表包括硬盘ID信息和对应的SMARTID信息。Specifically, after receiving the request, the expansion card polls the original SMART data of all hard disks in turn, summarizes the original SMART data in the hard disk original data table, and sends the hard disk original data table to the BMC. The hard disk original data table includes hard disk ID information and corresponding SMARTID information.
S3:BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈。S3: The BMC cooperates with the PC to predict hard disk failures based on SMART data and provide data feedback through a preset transmission method.
在本步骤中,对硬盘故障预测处理可采用以下两种方式:In this step, the hard disk failure prediction process can be carried out in the following two ways:
1、BMC获取到SMART数据后,在BMC端启动预置的硬盘故障预测算法进行硬盘故障预警分析处理,将分析结果通过ipmitool、redfish或者web方式发送至PC端。1. After the BMC obtains the SMART data, it starts the preset hard disk failure prediction algorithm on the BMC side to perform hard disk failure warning analysis and processing, and sends the analysis results to the PC side through ipmitool, redfish or web.
2、BMC获取到SMART数据后,将SMART数据通过ipmitool、redfish或者web方式发送至PC端;PC端接收到数据后,使用客制化硬盘故障预测处理工具对SMART数据进进行硬盘故障预警分析处理,并显示分析结果。2. After BMC obtains SMART data, it sends the SMART data to the PC through ipmitool, redfish or web. After the PC receives the data, it uses a customized hard disk failure prediction processing tool to perform hard disk failure warning analysis on the SMART data and display the analysis results.
实施例三:Embodiment three:
基于上述实施例,如图3所示,本实施例还公开了一种带外硬盘故障预测方法,包括BMC部分和Expander部分。Based on the above embodiment, as shown in FIG3 , this embodiment further discloses an out-of-band hard disk failure prediction method, including a BMC part and an Expander part.
BMC部分作为I2C的master端,从Expander获取硬盘的原始SMART数据。采用轮询的方式,多次轮询,每次轮询一条或者多条硬盘的SMART数据,直至返回所有硬盘的全部SMART数据。BMC获取到SMART数据后,可以在BMC端进行硬盘故障预测处理,发送处理后的数据给客户端;也可以直接将SMART原始数据发送给客户端,有客户端客制化硬盘故障预测处理。BMC端口可以通过ipmitool、redfish或者web等多种方式,将这些数据传递给客户端。As the master of I2C, the BMC part obtains the original SMART data of the hard disk from the Expander. It uses a polling method and polls multiple times, polling the SMART data of one or more hard disks each time until all the SMART data of all hard disks are returned. After the BMC obtains the SMART data, it can perform hard disk failure prediction processing on the BMC side and send the processed data to the client; it can also directly send the original SMART data to the client and have the client customize the hard disk failure prediction processing. The BMC port can pass this data to the client through ipmitool, redfish, or web and other methods.
Expander部分会使用SATA接口依次轮询所有硬盘的SMART信息,并将这个信息进行汇总。如表1所示,为Expander获取的硬盘原始SMART数据的示例,表格的标题如SMART规范《SFF8035-R2》中描述。汇总的数据以(硬盘ID,SMARTID)为单位封装,即一条数据只存储一个硬盘的一条SMART属性信息,对应表格中的一行数据。Expander会作为BMC的I2C的slave端,在BMC发出请求时,将硬盘的SMART信息反馈给BMC。The Expander part will use the SATA interface to poll the SMART information of all hard disks in turn and summarize this information. As shown in Table 1, it is an example of the original SMART data of the hard disk obtained by the Expander. The title of the table is described in the SMART specification "SFF8035-R2". The summarized data is packaged in units of (hard disk ID, SMARTID), that is, one piece of data only stores one piece of SMART attribute information of a hard disk, corresponding to one row of data in the table. Expander will act as the slave end of the I2C of the BMC, and when the BMC issues a request, it will feed back the SMART information of the hard disk to the BMC.
表1-单个硬盘的SMART原始数据表Table 1 - SMART raw data table of a single hard disk
本实施例提供了一种带外硬盘故障预测方法,采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,不需要在系统OS下安装软件,能够在系统用户无感知的情况下,实现硬盘故障预测预警,并便于服务器硬盘故障预测监控的集中管理的实现。This embodiment provides an out-of-band hard disk fault prediction method, which obtains the SMART information of the hard disk in an out-of-band manner based on SAS Expander. It does not need to install software under the system OS, can implement hard disk fault prediction and warning without the system user's perception, and facilitates the implementation of centralized management of server hard disk fault prediction monitoring.
实施例四:Embodiment 4:
本实施例公开了一种带外硬盘故障预测装置,包括处理器和存储器;其中,所述处理器执行所述存储器中保存的带外硬盘故障预测程序时实现以下步骤:This embodiment discloses an out-of-band hard disk fault prediction device, including a processor and a memory; wherein the processor implements the following steps when executing an out-of-band hard disk fault prediction program stored in the memory:
1、BMC向扩展卡发出信息获取请求。。1. BMC sends an information acquisition request to the expansion card.
2、扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC。2. After receiving the request, the expansion card polls the original SMART data of all hard disks in turn, and feeds back the aggregated SMART data to the BMC.
3、BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈。3. BMC cooperates with the PC to predict hard disk failures based on SMART data and provide data feedback through a preset transmission method.
进一步的,本实施例中的带外硬盘故障预测装置,还可以包括:Furthermore, the out-of-band hard disk failure prediction device in this embodiment may also include:
输入接口,用于获取外界导入的带外硬盘故障预测程序,并将获取到的带外硬盘故障预测程序保存至所述存储器中,还可以用于获取外界终端设备传输的各种指令和参数,并传输至处理器中,以便处理器利用上述各种指令和参数展开相应的处理。本实施例中,所述输入接口具体可以包括但不限于USB接口、串行接口、语音输入接口、指纹输入接口、硬盘读取接口等。The input interface is used to obtain an external hard disk fault prediction program imported from the outside, and save the obtained external hard disk fault prediction program to the memory, and can also be used to obtain various instructions and parameters transmitted by external terminal devices, and transmit them to the processor, so that the processor can use the above various instructions and parameters to carry out corresponding processing. In this embodiment, the input interface can specifically include but is not limited to a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, etc.
输出接口,用于将处理器产生的各种数据输出至与其相连的终端设备,以便于与输出接口相连的其他终端设备能够获取到处理器产生的各种数据。本实施例中,所述输出接口具体可以包括但不限于USB接口、串行接口等。The output interface is used to output various data generated by the processor to the terminal device connected thereto, so that other terminal devices connected to the output interface can obtain various data generated by the processor. In this embodiment, the output interface may specifically include but is not limited to a USB interface, a serial interface, etc.
通讯单元,用于在带外硬盘故障预测装置和外部服务器之间建立远程通讯连接,以便于带外硬盘故障预测装置能够将镜像文件挂载到外部服务器中。本实施例中,通讯单元具体可以包括但不限于基于无线通讯技术或有线通讯技术的远程通讯单元。The communication unit is used to establish a remote communication connection between the out-of-band hard disk failure prediction device and the external server so that the out-of-band hard disk failure prediction device can mount the image file to the external server. In this embodiment, the communication unit may specifically include but is not limited to a remote communication unit based on wireless communication technology or wired communication technology.
键盘,用于获取用户通过实时敲击键帽而输入的各种参数数据或指令。The keyboard is used to obtain various parameter data or instructions input by the user by tapping the keycaps in real time.
显示器,用于运行服务器供电线路短路定位过程的相关信息进行实时显示。The display is used to display the relevant information of the server power supply line short circuit locating process in real time.
鼠标,可以用于协助用户输入数据并简化用户的操作。The mouse can be used to assist users in inputting data and simplify user operations.
实施例五:Embodiment five:
本实施例还公开了一种可读存储介质,这里所说的可读存储介质包括随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动硬盘、CD-ROM或技术领域内所公知的任意其他形式的存储介质。可读存储介质中存储有带外硬盘故障预测程序,所述带外硬盘故障预测程序被处理器执行时实现以下步骤:This embodiment also discloses a readable storage medium, and the readable storage medium mentioned here includes random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, hard disk, removable hard disk, CD-ROM or any other form of storage medium known in the technical field. The readable storage medium stores an out-of-band hard disk fault prediction program, and when the out-of-band hard disk fault prediction program is executed by the processor, the following steps are implemented:
1、BMC向扩展卡发出信息获取请求。1. The BMC sends an information acquisition request to the expansion card.
2、扩展卡接收到请求后依次轮询所有硬盘的原始SMART数据,将数据汇总后的SMART数据反馈给BMC。2. After receiving the request, the expansion card polls the original SMART data of all hard disks in turn, and feeds back the aggregated SMART data to the BMC.
3、BMC配合PC端根据SMART数据进行硬盘故障预测处理,并通过预设传输方式进行数据的反馈。3. BMC cooperates with the PC to predict hard disk failures based on SMART data and provide data feedback through a preset transmission method.
综上所述,本发明采用基于SAS Expander通过带外的方式获取硬盘的SMART信息,并实现了硬盘故障预测预警。In summary, the present invention obtains the SMART information of the hard disk in an out-of-band manner based on SAS Expander, and realizes hard disk failure prediction and warning.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的方法而言,由于其与实施例公开的系统相对应,所以描述的比较简单,相关之处参见方法部分说明即可。In this specification, each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the embodiments can be referred to each other. As for the method disclosed in the embodiment, since it corresponds to the system disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the method part.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals may further appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the interchangeability of hardware and software, the composition and steps of each example have been generally described in the above description according to function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统、系统和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,系统或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed systems, systems and methods can be implemented in other ways. For example, the system embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of systems or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit.
同理,在本发明各个实施例中的各处理单元可以集成在一个功能模块中,也可以是各个处理单元物理存在,也可以两个或两个以上处理单元集成在一个功能模块中。Similarly, each processing unit in each embodiment of the present invention may be integrated into one functional module, or each processing unit may exist physically, or two or more processing units may be integrated into one functional module.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly using hardware, a software module executed by a processor, or a combination of the two. The software module may be placed in a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.
以上对本发明所提供的带外硬盘故障预测方法、系统、装置及可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。The above is a detailed introduction to the out-of-band hard disk failure prediction method, system, device and readable storage medium provided by the present invention. Specific examples are used herein to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea. It should be pointed out that for ordinary technicians in this technical field, without departing from the principles of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211574304.XA CN116166495A (en) | 2022-12-08 | 2022-12-08 | An out-of-band hard disk failure prediction system, method, device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211574304.XA CN116166495A (en) | 2022-12-08 | 2022-12-08 | An out-of-band hard disk failure prediction system, method, device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116166495A true CN116166495A (en) | 2023-05-26 |
Family
ID=86419067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211574304.XA Pending CN116166495A (en) | 2022-12-08 | 2022-12-08 | An out-of-band hard disk failure prediction system, method, device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116166495A (en) |
-
2022
- 2022-12-08 CN CN202211574304.XA patent/CN116166495A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112286709B (en) | A kind of server hardware fault diagnosis method, diagnosis device and diagnosis equipment | |
CN113626869B (en) | Data processing method, system, electronic device and storage medium | |
CN104166611A (en) | Hard disk temperature information acquisition device and method | |
CN112672405B (en) | Power consumption calculation method, device, storage medium, electronic equipment and server | |
CN108090000A (en) | A kind of method and system for obtaining CPU register informations | |
CN113900718B (en) | A method, system and device for decoupling BMC and BIOS asset information | |
CN115576779A (en) | Server hardware information management method, device, computer equipment and storage medium | |
CN106919490A (en) | Server failure detection method and device | |
US11507446B1 (en) | Hot-swap controller fault reporting system | |
CN116166495A (en) | An out-of-band hard disk failure prediction system, method, device and readable storage medium | |
CN108491299A (en) | A kind of signal detection board and the mainboard for signal detection | |
CN109558300B (en) | Whole cabinet alarm processing method and device, terminal and storage medium | |
CN115525517A (en) | Mixed-insertion hard disk lamp signal control system, method and device and readable storage medium | |
CN114003419B (en) | A method, system and device for automatically testing memory RAS characteristics based on OSES | |
CN111309511A (en) | A method, device and terminal for processing application operation data | |
CN117201356A (en) | Communication equipment online monitoring management method, system, electronic equipment and medium | |
CN116680151A (en) | Dynamic monitoring method, system, terminal and storage medium for hard disk performance | |
CN115470054A (en) | A server memory function testing method, system, device and storage medium | |
CN116628737A (en) | File reading method, device, server, electronic device and storage medium | |
CN116560922A (en) | Method, system and device for power-on and power-off testing of server and readable storage medium | |
CN115827382A (en) | A switch status information monitoring system, method, device and storage medium | |
CN115129566A (en) | Method, system, equipment and storage medium for verifying bandwidth performance of hard disk backplane | |
WO2020248754A1 (en) | Electronic device and cluster server system | |
CN108540337B (en) | A dual network port POS machine and its network state monitoring system and method | |
CN117634436A (en) | Work order data transmission method, device, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |