CN106294065A - Hard disk failure monitoring method, Apparatus and system - Google Patents

Hard disk failure monitoring method, Apparatus and system Download PDF

Info

Publication number
CN106294065A
CN106294065A CN201610609204.4A CN201610609204A CN106294065A CN 106294065 A CN106294065 A CN 106294065A CN 201610609204 A CN201610609204 A CN 201610609204A CN 106294065 A CN106294065 A CN 106294065A
Authority
CN
China
Prior art keywords
hard disk
determining
state data
information
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610609204.4A
Other languages
Chinese (zh)
Inventor
范瑞展
缪亦奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201610609204.4A priority Critical patent/CN106294065A/en
Publication of CN106294065A publication Critical patent/CN106294065A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the present application provides hard disk failure monitoring method, Apparatus and system, the method is according to the status data of hard disk, obtain the actual consumption life-span that hard disk is current, the default life-span according to described actual consumption life-span Yu described hard disk, calculate the danger coefficient of described hard disk, this danger coefficient indicates the probabilistic information that this hard disk breaks down, such that it is able to make user find the exception of hard disk in time, user can also know whether to need to change hard disk in advance according to this dangerous information, thus avoids the catastrophic consequence that hard disk corruptions loss of data causes.

Description

Hard disk fault monitoring method, device and system
Technical Field
The application relates to the technical field of hard disks, in particular to a hard disk fault monitoring method, device and system.
Background
Hard disks are the most important storage devices in electronic devices, and are used as carriers of data and information of users of electronic devices, and a large amount of important data are often stored on the hard disks. The average failure-free time of most hard disks reaches more than 30000-50000 hours, however, for many users, especially for commercial users, one common hard disk failure can cause catastrophic consequences. The timely discovery of the abnormality of the hard disk is the fundamental premise of keeping the stable operation of the electronic equipment and protecting the data safety.
Disclosure of Invention
In view of this, the present invention provides a hard disk failure monitoring method, apparatus and system, so as to overcome the problems of data loss in a hard disk and reduced stability of an electronic device caused by that abnormality of the hard disk is not found in time in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme:
a hard disk fault monitoring method comprises the following steps:
acquiring the current actual service life of a hard disk according to state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
and calculating the danger coefficient of the hard disk according to the actual service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk failing.
Preferably, the method further comprises the following steps:
determining a source hard disk with a risk coefficient greater than or equal to a first preset value;
determining a target hard disk meeting preset conditions;
and generating a hard disk migration instruction, wherein the hard disk migration instruction carries the address information of the source hard disk and the address information of the destination hard disk.
Preferably, the method further comprises the following steps:
determining the danger level of the current hard disk according to the danger coefficient;
and outputting alarm information corresponding to the danger level.
A hard disk failure monitoring device, comprising:
the acquisition module is used for acquiring the current actual service life of the hard disk according to state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
and the calculation module is used for calculating the danger coefficient of the hard disk according to the actual consumption life and the preset life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk failing.
A hard disk fault monitoring system comprises a substrate management controller, a bus and a monitor, wherein the substrate management controller is connected with the monitor through the bus;
the monitor is used for monitoring state data of the hard disk and transmitting the state data to the substrate management controller through the bus, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
the substrate management controller is used for acquiring the current actual consumption service life of the hard disk according to the state data of the hard disk, calculating the danger coefficient of the hard disk in failure according to the actual consumption service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk in failure.
Preferably, the baseboard management controller is further configured to:
determining a source hard disk with a risk coefficient greater than or equal to a first preset value;
determining a target hard disk meeting preset conditions;
generating a hard disk migration instruction, and sending the hard disk migration instruction to the monitor through the bus, wherein the hard disk migration instruction carries address information of the source hard disk and address information of the destination hard disk;
the monitor is further configured to: and migrating the data of the source hard disk to the target hard disk according to the hard disk migration instruction.
The status data further includes the remaining amount of storage space of the hard disk, and the baseboard management controller is specifically configured to, when determining that the target hard disk meets the preset condition:
determining the hard disk with the largest storage space residual amount as the target hard disk;
or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk;
or, determining the hard disk with the minimum risk coefficient as the target hard disk.
Preferably, the baseboard management controller is further configured to:
determining the danger level of the current hard disk according to the danger coefficient;
and outputting alarm information corresponding to the danger level.
Wherein,
the bus is an I2C bus, and the monitor is an array controller;
or, the bus is a KCS bus, and the monitor is internally provided with an operating system and application software (software), wherein the operating system executes data migration operation on the hard disk through the application software.
A hard disk failure monitoring system comprising a processor and a memory, wherein:
the memory is used for storing state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
the processor is used for acquiring the current actual consumption service life of the hard disk according to the state data of the hard disk stored in the memory, and calculating the danger coefficient of the hard disk in failure according to the actual consumption service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk in failure.
According to the technical scheme, compared with the prior art, the method for monitoring the hard disk failure provided by the embodiment of the invention has the advantages that the current actual consumption life of the hard disk is obtained according to the state data of the hard disk, the risk coefficient of the hard disk is calculated according to the actual consumption life and the preset life of the hard disk, the risk coefficient indicates the probability information of the hard disk failure, so that a user can find the abnormality of the hard disk in time, and the user can know whether the hard disk needs to be replaced in advance according to the risk information, so that the catastrophic result caused by the data loss of the hard disk damage is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating a relationship between a working environment temperature of a hard disk and a failure rate;
fig. 2 is a schematic flowchart of a hard disk failure monitoring method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for migrating data in a hard disk failure monitoring method according to an embodiment of the present application;
fig. 4 is a schematic alarm diagram in a hard disk failure monitoring method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a hard disk fault monitoring apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a hard disk fault monitoring system according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a specific implementation manner of a hard disk fault monitoring system according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another implementation manner in the hard disk failure monitoring system according to the embodiment of the present application.
Detailed Description
For the sake of reference and clarity, the descriptions, abbreviations or abbreviations of the technical terms used hereinafter are summarized as follows:
I2C bus: Inter-Integrated Circuit;
KCS: keyboard Controller Style; keyboard controller mode;
BMC: a Basebard Management Controller, a Baseboard Management Controller.
RAID: (ii) Redundant Arrays of Independent Disks, disk array;
HDD: hard Disk Drive.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A hard disk is an important storage element in an electronic device, and the life and temperature of the hard disk have a direct correlation, and as shown in fig. 1, the relationship between the operating environment temperature of the hard disk and the probability of failure (hereinafter, referred to as failure rate) is a schematic diagram, where the abscissa in fig. 1 represents the operating environment temperature of the hard disk, and the ordinate represents the annual failure rate of the hard disk (annual failure rate means the probability of failure in one year).
In fig. 1, curve 1 shows that the cumulative Power-On time (POH) of the hard disk for one year is 2400 hours, and curve 2 shows that the POH of the hard disk for one year is 8760 hours.
It can be seen from fig. 1 that the Annual Failure Rate (AFR) rises by a factor when the temperature rises from 30 ° to 70 °.
At present, the health of the hard disk is judged by looking at whether the track is bad or not, that is, after the hard disk is actually bad, the user replaces the hard disk, but the data stored in the hard disk may be lost.
At present, there is no method for monitoring the service life of the hard disk. Often, after a certain hard disk is damaged, the BMC records an SEL log (selection log), and then the user replaces the hard disk.
The hard disk fault monitoring method provided by the embodiment of the application can acquire the current risk coefficient of the hard disk, so that a user can judge whether data on the hard disk needs to be migrated or not according to the risk coefficient, and the risk that the data stored in the hard disk is damaged is avoided.
As shown in fig. 2, a schematic flow chart of a hard disk failure monitoring method provided in an embodiment of the present application is shown, where the method includes:
step S201: and acquiring the current actual service life of the hard disk according to the state data of the hard disk.
The state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time.
At present, hard disk manufacturers obtain the AFR of 0.73% through a plurality of tests when the working load of the hard disk is 50% and the working environment temperature is 40 ℃. That is, when the working environment temperature of the hard disk is 40 ℃ and the working load is 50%, the annual failure rate of the hard disk is 0.73%, but in actual use, the working environment temperature of the hard disk is not necessarily 40 ℃, and the load of the hard disk is changed along with the operating state of the electronic equipment and is not always maintained at 50%.
Moreover, for any electronic device, the lifetime of a hard disk is only for one or a few hard disks, but the 0.73% is the annual failure rate obtained by taking thousands of hard disks as a sample. Therefore, 0.73% does not represent the annual failure rate of each hard disk in actual use, so it is very important to estimate the life of one or more hard disks in each electronic device in advance.
The current actual consumed life of the hard disk can be calculated by the following formula:
wherein T refers to the current power-on time from the beginning of the use of the hard disk, and the temperature (T) represents the actual working environment temperature of the hard disk at the time T; the load (T) represents the actual workload of the hard disk at time T.
It should be noted that the above formula does not constitute a limitation of the present application, and those skilled in the art can design the formula according to the technical idea provided by the present invention and the practical application requirements.
Step S202: and calculating the danger coefficient of the hard disk according to the actual consumption life and the preset life of the hard disk.
The danger coefficient indicates probability information of the hard disk failure.
The predetermined lifetime of the hard disk may refer toWhere T1 represents a time specified by a hard disk manufacturer, such as 1 year, 2 years, 5 years, etc.
The risk factor X may be L _ real/L _ total.
It should be noted that the above formula does not constitute a limitation of the present application, and those skilled in the art can design the formula according to the technical idea provided by the present invention and the practical application requirements.
It can be understood that the closer the lifetime of the hard disk is to the preset lifetime, the greater the probability that the hard disk may malfunction is. Whether to replace the hard disk can be determined according to the danger coefficient X.
The embodiment of the invention provides a hard disk failure monitoring method, which comprises the steps of obtaining the current actual consumption life of a hard disk according to state data of the hard disk, calculating the danger coefficient of the hard disk according to the actual consumption life and the preset life of the hard disk, wherein the danger coefficient shows the probability information of the hard disk failing, so that a user can find the abnormality of the hard disk in time, and the user can know whether the hard disk needs to be replaced in advance according to the danger information, thereby avoiding the catastrophic consequences caused by the data loss of the damaged hard disk.
When a plurality of hard disks are installed on the electronic device, the electronic device may automatically migrate data stored in the hard disk that may have a failure to another hard disk, and in the embodiment of the hard disk failure monitoring method, the method may further include a method for migrating data stored in a hard disk, as shown in fig. 3, where the method includes:
step S301: and determining the source hard disk with the risk coefficient more than or equal to the first preset value.
The first preset value may be determined according to actual conditions, for example, the first preset value may be 80%, 90%, 100%, and so on.
In the embodiment of the application, a hard disk with a risk coefficient greater than or equal to a first preset value in electronic equipment is called a source hard disk.
Step S302: and determining the target hard disk meeting the preset conditions.
In the embodiment of the application, a hard disk meeting preset conditions in the electronic equipment is called a target hard disk.
The state data may further include a remaining amount of storage space of the hard disk, and the preset condition may be: determining the hard disk with the largest storage space residual amount as the target hard disk; or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk; or, determining the hard disk with the minimum risk coefficient as the target hard disk.
Step S303: and generating a hard disk migration instruction, wherein the hard disk migration instruction carries the address information of the source hard disk and the address information of the destination hard disk.
It can be understood that when the risk coefficient of the hard disk is greater than or equal to the second preset value, alarm information can be output. For example, when the risk factor is 30%, no alarm is given, and when the risk factor is 80% or more, an alarm is given. The second preset value is determined according to actual conditions, for example, the second preset value may be set according to the importance degree of data stored in the hard disk, and the higher the importance degree is, the smaller the second preset value is, the second preset value may be set by a user through a display screen of the electronic device, or may be preset before the electronic device leaves a factory.
No matter what the danger coefficient of the hard disk is, the output alarm information can be the same or different. For example, when the risk factor of the hard disk is 80% or 100%, the output alarm information is the same, for example, the display screen of the electronic device displays the information that the hard disk is about to fail. Or, when the risk factor of the hard disk is 80% or 100%, the output alarm information is different, specifically, the hard disk fault monitoring method may further include: determining the danger level of the current hard disk according to the danger coefficient; and outputting alarm information corresponding to the danger level.
For example, a risk factor of 80% -90% is the second risk level, 91% to 100% is the first risk level, and so on. The first risk level may correspond to a red alarm and the second risk level may correspond to a yellow alarm.
As shown in fig. 4, the HDD 1 represents a hard disk, and when the BMC 41 determines that the risk factor X of the HDD 1 is: when 90% >, X >, 80%, it indicates that the life of the HDD 1 is approaching the specification, a yellow alarm may be output, which may alert the user to the health of the HDD 1, and may require preparation for data relocation, replacement of a spare hard disk, and the like.
When the risk factor X of the HDD 1 is: when 100% >, X >, 91%, this indicates that the HDD 1 is at risk of damage, a red alarm may be output, and a red snake alarm may be used to prompt the user to perform data migration and replace the HDD 1, so as to avoid possible damage to the HDD 1 and data loss.
In addition to the above hard disk failure monitoring method, an embodiment of the present application further provides a hard disk failure monitoring device, and please refer to descriptions of corresponding steps in the hard disk failure monitoring method for descriptions of modules in the hard disk failure monitoring device, which are not described herein again, and as shown in fig. 5, the present application provides a schematic structural diagram of the hard disk failure monitoring device, where the device includes: an obtaining module 501 and a calculating module 502, wherein:
an obtaining module 501, configured to obtain a current actual life of a hard disk according to state data of the hard disk, where the state data includes temperature information of the hard disk at each time and load information of the hard disk at each time.
A calculating module 502, configured to calculate a risk coefficient of the hard disk according to the actual life consumption and the preset life of the hard disk, where the risk coefficient indicates probability information of the hard disk failing.
The embodiment of the invention provides a hard disk failure monitoring device, wherein an acquisition module 501 acquires the current actual consumption life of a hard disk according to state data of the hard disk, and a calculation module 502 calculates a danger coefficient of the hard disk according to the actual consumption life and the preset life of the hard disk, wherein the danger coefficient indicates probability information of failure of the hard disk, so that a user can find abnormality of the hard disk in time, and the user can know whether the hard disk needs to be replaced in advance according to the danger information, thereby avoiding catastrophic consequences caused by data loss due to hard disk damage.
When a plurality of hard disks are installed on the electronic device, the electronic device may automatically migrate data stored in the hard disk that may have a fault to another hard disk, and therefore the hard disk fault monitoring apparatus may further include:
and the first determining module is used for determining the source hard disk with the risk coefficient larger than or equal to a first preset value.
And the second determining module is used for determining the target hard disk meeting the preset conditions.
The state data may further include a remaining amount of storage space of the hard disk, and the preset condition may be: determining the hard disk with the largest storage space residual amount as the target hard disk; or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk; or, determining the hard disk with the minimum risk coefficient as the target hard disk.
And the generating module is used for generating a hard disk migration instruction, and the hard disk migration instruction carries the address information of the source hard disk and the address information of the destination hard disk.
Any one of the above hard disk failure monitoring devices may further include:
the third determining module is used for determining the current danger level of the hard disk according to the danger coefficient; and the output module is used for outputting alarm information corresponding to the danger level.
For details, reference may be made to the description of fig. 4, which is not described herein again.
An embodiment of the present application further provides a hard disk fault monitoring system, as shown in fig. 6, the hard disk fault monitoring system includes: the BMC 41 is connected with the monitor 62 through the bus 61.
The monitor 62 is configured to monitor status data of the hard disk, and transmit the status data to the baseboard management controller through the bus, where the status data includes temperature information of the hard disk at each time and load information of the hard disk at each time.
The bus 61 may be an I2C bus or a KCS bus or the like.
And the BMC 41 is used for acquiring the current actual consumption life of the hard disk according to the state data of the hard disk, and calculating the risk coefficient of the hard disk failing according to the actual consumption life and the preset life of the hard disk, wherein the risk coefficient indicates the probability information of the hard disk failing.
Some electronic devices include the BMC, but the BMC included in the electronic devices in the prior art does not have the function of the BMC 41 in the embodiment of the present application, and the function of the BMC 41 in the embodiment of the present application is embedded in the code of the BMC included in the electronic devices in the prior art, so that the function is implemented without adding extra hardware, that is, the hardware cost is not increased.
For a detailed description of the monitor 63 and the BMC 41, reference may be made to a detailed description of each step corresponding to the hard disk failure monitoring method in fig. 2, and details are not repeated here.
In the above hard disk failure monitoring system, the baseboard management controller is further configured to: determining a source hard disk with a risk coefficient greater than or equal to a first preset value; determining a target hard disk meeting preset conditions; generating a hard disk migration instruction, and sending the hard disk migration instruction to the monitor through the bus, wherein the hard disk migration instruction carries address information of the source hard disk and address information of the destination hard disk; the monitor is further configured to: and migrating the data of the source hard disk to the target hard disk according to the hard disk migration instruction.
The status data further includes the remaining amount of storage space of the hard disk, and the baseboard management controller is specifically configured to, when determining that the target hard disk meets the preset condition: determining the hard disk with the largest storage space residual amount as the target hard disk; or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk; or, determining the hard disk with the minimum risk coefficient as the target hard disk.
In any one of the above hard disk failure monitoring systems, the baseboard management controller is further configured to: determining the danger level of the current hard disk according to the danger coefficient; and outputting alarm information corresponding to the danger level.
For details, reference may be made to the description of fig. 4, which is not described herein again.
In order to make persons skilled in the art understand the hard disk failure monitoring system provided in the embodiments of the present application, two specific examples are described below to illustrate an implementation process of the hard disk failure monitoring system.
Please refer to fig. 7, which is a schematic structural diagram of a specific implementation manner of the hard disk failure monitoring system according to the embodiment of the present application.
Bus 61 is an I2C bus and monitor 62 may be an array controller 71 in a RAID. The HDD is a hard disk in RAID, and in order to describe the hard disk failure monitoring system more clearly, the array controller 71 and the HDD in RAID are separated in fig. 7. Fig. 7 shows two hard disks HDD 1 and HDD2, and it is understood that the number of hard disks may be 1, and at this time, the electronic device may not automatically perform migration of data stored in the hard disks, and needs to be replaced by the user, and the number of hard disks may be 2 or more, and at this time, the electronic device may not automatically perform migration of data stored in the hard disks, and may also automatically perform migration of data stored in the hard disks.
For the BMC, the state data of the hard disks cannot be directly obtained, the array controller 71 is required to transmit the state data of the hard disks, namely, the HDD 1 and the HDD2, to the BMC 41 through the I2C bus, and for each hard disk, the BMC 41 obtains the current actual consumption life L _ real1 (or L _ real2) of the HDD 1 (or HDD 2) according to the state data of the HDD 1 (or HDD 2), and calculates the risk coefficient X1 (or X2) of the failure of the HDD 1 (or HDD 2) according to the actual consumption life and the preset life L _ total 1 (or L _ total 2) of the HDD 1 (or HDD 2).
The BMC 41 may determine whether X1 and X2 are greater than a second preset value (e.g., 80%), and if X1 is determined to be 90%, that is, greater than or equal to 80%, and X2 is determined to be 30%, at this time, the BMC may output an alarm message, such as a yellow alarm.
If the BMC 41 determines that the data in the hard disk with the risk coefficient greater than or equal to the first preset value (assumed to be 85%) needs to be migrated, the BMC 41 may determine that the data in the HDD 1 needs to be migrated, and the BMC 41 may further calculate an optimal data migration position, and if the determined optimal data migration position is the HDD2, generate a hard disk migration instruction, where the hard disk migration instruction includes address information of the HDD 1 and address information of the HDD 2. The array controller 71 can transfer the data stored in the HDD 1 to the HDD2 after receiving the hard disk transfer instruction.
Please refer to fig. 8, which is a schematic structural diagram of another implementation manner in the hard disk failure monitoring system according to the embodiment of the present application.
The bus 61 is a KCS bus, the monitor 62 may be internally provided with an Operating System (OS) 81 and application software 82, the Operating system 81 may obtain status data of the hard disks HDD 1 and HDD2 through the application software 82, and the Operating system 81 may further perform a data migration operation on the hard disks through the application software 82 and obtain status data of the hard disks.
After obtaining status data of each hard disk, for example, the hard disks HDD 1 and HDD2, through the application software 82, the operating system 81 may transmit the status data to the BMC 41 through a KCS bus (the monitor 63 may be connected to the KCS bus through a Basic Input output system 83 (BIOS)).
The principle of the operating system 81 obtaining the state data may be to perform overall reading and writing on a register or a disk surface sector of a designated hard disk, thereby obtaining the state data of the hard disk.
For each hard disk, the BMC 41 obtains the current actual consumption life L _ real1 (or L _ real2) of the HDD 1 (or HDD 2) according to the status data of the HDD 1 (or HDD 2), and calculates the risk coefficient X1 (or X2) of the HDD 1 (or HDD 2) that has failed according to the actual consumption life and the preset life L _ weather 1 (or L _ weather 2) of the HDD 1 (or HDD 2).
The BMC 41 may determine whether X1 and X2 are greater than a second preset value (e.g., 80%), and if X1 is determined to be 90%, that is, greater than or equal to 80%, and X2 is determined to be 30%, at this time, the BMC may output an alarm message, such as a yellow alarm.
If the BMC 41 determines that the data in the hard disk with the risk coefficient greater than or equal to the first preset value (assumed to be 85%) needs to be migrated, the BMC 41 may determine that the data in the HDD 1 needs to be migrated, and the BMC 41 may further calculate an optimal location for the data migration, and if the determined optimal location for the data migration is the HDD2, generate a hard disk migration instruction, where the hard disk migration instruction includes address information of the HDD 1 and address information of the HDD 2. The hard disk relocation command may be transmitted to the operating system 81 in the monitor 63 through the KCS bus and the BIOS 83, and the operating system 81 relocates the data stored in the HDD 1 to the HDD2 through the application 82.
The hard disk fault monitoring system provided by the embodiment of the application can be electronic equipment such as a computer, a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer and the like.
The hard disk failure monitoring system may include a memory, a processor.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing of the electronic device by operating the software programs and modules stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a function of calculating a risk factor, etc.), and the like; the storage data area may store data created according to use of the electronic device (such as status data of a hard disk, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor is a control center of the electronic equipment, connects various parts of the whole electronic equipment by various interfaces and lines, and executes various functions and processes data of the electronic equipment by operating or executing software programs and/or modules stored in the memory and calling the data stored in the memory, thereby integrally monitoring the electronic equipment. Alternatively, the processor may include one or more processing units; preferably, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.
The processor in the embodiment of the application can obtain the current actual consumption life of the hard disk according to the state data of the hard disk stored in the memory, and calculate the risk coefficient of the hard disk failure according to the actual consumption life and the preset life of the hard disk.
The memory may further store a first preset value, a preset condition, and the processor may be further configured to: determining a source hard disk with a risk coefficient greater than or equal to a first preset value; determining a target hard disk meeting preset conditions; and generating a hard disk migration instruction, wherein the hard disk migration instruction carries the address information of the source hard disk and the address information of the destination hard disk.
The processor may be further configured to: determining the hard disk with the largest storage space residual amount as the target hard disk; or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk; or, determining the hard disk with the minimum risk coefficient as the target hard disk.
The memory may further store a corresponding relationship between a risk level and alarm information, and the processor may further be configured to: determining the danger level of the current hard disk according to the danger coefficient; and outputting alarm information corresponding to the danger level.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A hard disk fault monitoring method is characterized by comprising the following steps:
acquiring the current actual service life of a hard disk according to state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
and calculating the danger coefficient of the hard disk according to the actual service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk failing.
2. The hard disk failure monitoring method according to claim 1, further comprising:
determining a source hard disk with a risk coefficient greater than or equal to a first preset value;
determining a target hard disk meeting preset conditions;
and generating a hard disk migration instruction, wherein the hard disk migration instruction carries the address information of the source hard disk and the address information of the destination hard disk.
3. The hard disk failure monitoring method according to claim 1 or 2, further comprising:
determining the danger level of the current hard disk according to the danger coefficient;
and outputting alarm information corresponding to the danger level.
4. A hard disk fault monitoring device, comprising:
the acquisition module is used for acquiring the current actual service life of the hard disk according to state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
and the calculation module is used for calculating the danger coefficient of the hard disk according to the actual consumption life and the preset life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk failing.
5. A hard disk fault monitoring system is characterized by comprising a substrate management controller, a bus and a monitor, wherein the substrate management controller is connected with the monitor through the bus;
the monitor is used for monitoring state data of the hard disk and transmitting the state data to the substrate management controller through the bus, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
the substrate management controller is used for acquiring the current actual consumption service life of the hard disk according to the state data of the hard disk, calculating the danger coefficient of the hard disk in failure according to the actual consumption service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk in failure.
6. The hard disk failure monitoring system of claim 5, wherein the baseboard management controller is further configured to:
determining a source hard disk with a risk coefficient greater than or equal to a first preset value;
determining a target hard disk meeting preset conditions;
generating a hard disk migration instruction, and sending the hard disk migration instruction to the monitor through the bus, wherein the hard disk migration instruction carries address information of the source hard disk and address information of the destination hard disk;
the monitor is further configured to: and migrating the data of the source hard disk to the target hard disk according to the hard disk migration instruction.
7. The system for monitoring hard disk failures according to claim 6, wherein the status data further includes a remaining amount of storage space of the hard disk, and when the baseboard management controller determines that the target hard disk meets the preset condition, the baseboard management controller is specifically configured to:
determining the hard disk with the largest storage space residual amount as the target hard disk;
or determining the hard disk with the largest residual amount of the storage space from the hard disks with the risk coefficients less than or equal to a third preset value, and determining the hard disk as the target hard disk;
or, determining the hard disk with the minimum risk coefficient as the target hard disk.
8. The system for monitoring hard disk failures according to any of claims 5 to 7, wherein the baseboard management controller is further configured to:
determining the danger level of the current hard disk according to the danger coefficient;
and outputting alarm information corresponding to the danger level.
9. The hard disk failure monitoring system according to any of claims 5 to 7,
the bus is an I2C bus, and the monitor is an array controller;
or, the bus is a KCS bus, and the monitor is internally provided with an operating system and application software (software), wherein the operating system executes data migration operation on the hard disk through the application software.
10. A hard disk failure monitoring system comprising a processor and a memory, wherein:
the memory is used for storing state data of the hard disk, wherein the state data comprises temperature information of the hard disk at each time and load information of the hard disk at each time;
the processor is used for acquiring the current actual consumption service life of the hard disk according to the state data of the hard disk stored in the memory, and calculating the danger coefficient of the hard disk in failure according to the actual consumption service life and the preset service life of the hard disk, wherein the danger coefficient indicates the probability information of the hard disk in failure.
CN201610609204.4A 2016-07-28 2016-07-28 Hard disk failure monitoring method, Apparatus and system Pending CN106294065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610609204.4A CN106294065A (en) 2016-07-28 2016-07-28 Hard disk failure monitoring method, Apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610609204.4A CN106294065A (en) 2016-07-28 2016-07-28 Hard disk failure monitoring method, Apparatus and system

Publications (1)

Publication Number Publication Date
CN106294065A true CN106294065A (en) 2017-01-04

Family

ID=57662687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610609204.4A Pending CN106294065A (en) 2016-07-28 2016-07-28 Hard disk failure monitoring method, Apparatus and system

Country Status (1)

Country Link
CN (1) CN106294065A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980472A (en) * 2017-03-30 2017-07-25 上海与德科技有限公司 The method and device that a kind of EMMC health degrees judge
CN107515731A (en) * 2017-07-31 2017-12-26 华中科技大学 A kind of evolutionary storage system and its method of work based on solid-state disk
CN107544759A (en) * 2017-09-19 2018-01-05 郑州云海信息技术有限公司 A kind of disk array I O assignment system and method
CN107577582A (en) * 2017-09-28 2018-01-12 长沙曙通信息科技有限公司 A kind of storage system hard disk failure intelligent predicting management method
CN108345519A (en) * 2018-01-31 2018-07-31 河南职业技术学院 The processing method and processing device of hard disc of computer failure
CN108958998A (en) * 2018-06-12 2018-12-07 郑州云海信息技术有限公司 Server hard disc uses time detection method and device under a kind of linux
CN109117342A (en) * 2018-08-13 2019-01-01 郑州云海信息技术有限公司 A kind of server and its hard disk health status monitoring system
CN109710443A (en) * 2018-12-24 2019-05-03 平安科技(深圳)有限公司 A kind of data processing method, device, equipment and storage medium
CN110598802A (en) * 2019-09-26 2019-12-20 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device
CN110928742A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk retest period determination method, device, equipment and readable storage medium
CN118312109A (en) * 2024-06-07 2024-07-09 深圳市源微创新实业有限公司 Bad block management method, system, medium and product of industrial solid state disk

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467438A (en) * 2010-11-12 2012-05-23 英业达股份有限公司 Method for obtaining fault signal of storage device by baseboard management controller
US20120278661A1 (en) * 2011-04-27 2012-11-01 Hon Hai Precision Industry Co., Ltd. Hard disk backplane and hard disk monitoring system
CN103176884A (en) * 2011-12-20 2013-06-26 鸿富锦精密工业(深圳)有限公司 Hard disk monitoring system and hard disk monitoring method
CN103176919A (en) * 2013-03-07 2013-06-26 洛阳伟信电子科技有限公司 Simple and easy device and simple and easy method for computer hard disk data saving
CN104536855A (en) * 2014-12-03 2015-04-22 曙光信息产业(北京)有限公司 Fault detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467438A (en) * 2010-11-12 2012-05-23 英业达股份有限公司 Method for obtaining fault signal of storage device by baseboard management controller
US20120278661A1 (en) * 2011-04-27 2012-11-01 Hon Hai Precision Industry Co., Ltd. Hard disk backplane and hard disk monitoring system
CN103176884A (en) * 2011-12-20 2013-06-26 鸿富锦精密工业(深圳)有限公司 Hard disk monitoring system and hard disk monitoring method
CN103176919A (en) * 2013-03-07 2013-06-26 洛阳伟信电子科技有限公司 Simple and easy device and simple and easy method for computer hard disk data saving
CN104536855A (en) * 2014-12-03 2015-04-22 曙光信息产业(北京)有限公司 Fault detection method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980472A (en) * 2017-03-30 2017-07-25 上海与德科技有限公司 The method and device that a kind of EMMC health degrees judge
CN107515731A (en) * 2017-07-31 2017-12-26 华中科技大学 A kind of evolutionary storage system and its method of work based on solid-state disk
CN107544759B (en) * 2017-09-19 2021-01-29 苏州浪潮智能科技有限公司 Disk array IO distribution system and method
CN107544759A (en) * 2017-09-19 2018-01-05 郑州云海信息技术有限公司 A kind of disk array I O assignment system and method
CN107577582A (en) * 2017-09-28 2018-01-12 长沙曙通信息科技有限公司 A kind of storage system hard disk failure intelligent predicting management method
CN108345519A (en) * 2018-01-31 2018-07-31 河南职业技术学院 The processing method and processing device of hard disc of computer failure
CN108958998A (en) * 2018-06-12 2018-12-07 郑州云海信息技术有限公司 Server hard disc uses time detection method and device under a kind of linux
CN109117342A (en) * 2018-08-13 2019-01-01 郑州云海信息技术有限公司 A kind of server and its hard disk health status monitoring system
CN109710443A (en) * 2018-12-24 2019-05-03 平安科技(深圳)有限公司 A kind of data processing method, device, equipment and storage medium
CN109710443B (en) * 2018-12-24 2023-06-16 平安科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN110928742A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk retest period determination method, device, equipment and readable storage medium
CN110928742B (en) * 2019-08-08 2023-06-09 北京盛赞科技有限公司 Hard disk rechecking period determining method, device, equipment and readable storage medium
CN110598802A (en) * 2019-09-26 2019-12-20 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device
CN110598802B (en) * 2019-09-26 2021-07-27 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device
CN118312109A (en) * 2024-06-07 2024-07-09 深圳市源微创新实业有限公司 Bad block management method, system, medium and product of industrial solid state disk

Similar Documents

Publication Publication Date Title
CN106294065A (en) Hard disk failure monitoring method, Apparatus and system
US9026863B2 (en) Replacement of storage responsive to remaining life parameter
JP5160085B2 (en) Apparatus, system, and method for predicting failure of a storage device
US8839046B2 (en) Arranging data handling in a computer-implemented system in accordance with reliability ratings based on reverse predictive failure analysis in response to changes
US7596648B2 (en) System and method for information handling system error recovery
US20090150721A1 (en) Utilizing A Potentially Unreliable Memory Module For Memory Mirroring In A Computing System
US20120066568A1 (en) Storage device, electronic device, and data error correction method
CN107766180B (en) Storage medium management method and device and readable storage medium
US9069819B1 (en) Method and apparatus for reliable I/O performance anomaly detection in datacenter
US11921588B2 (en) System and method for data protection during power loss of a storage system
US8370688B2 (en) Identifying a storage device as faulty for a first storage volume without identifying the storage device as faulty for a second storage volume
US9280431B2 (en) Prioritizing backups on a disk level within enterprise storage
US11010250B2 (en) Memory device failure recovery system
US11126486B2 (en) Prediction of power shutdown and outage incidents
CN115039085A (en) Selective endpoint isolation for self-healing in cache and memory coherence systems
US8001425B2 (en) Preserving state information of a storage subsystem in response to communication loss to the storage subsystem
US20110107317A1 (en) Propagating Firmware Updates In A Raid Array
US11422723B2 (en) Multi-storage device lifecycle management system
US10747706B2 (en) Server event log storage and retrieval system
CN110851443A (en) Database storage management method and device, storage medium and electronic equipment
US10853547B2 (en) System and method to identify critical FPGA card sensors
US10862751B1 (en) Proactive service reminder based on customer configuration
CN118567934A (en) Server state monitoring method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170104

RJ01 Rejection of invention patent application after publication