CN114116374A - Hard disk monitoring method, system, device and medium - Google Patents
Hard disk monitoring method, system, device and medium Download PDFInfo
- Publication number
- CN114116374A CN114116374A CN202111229365.8A CN202111229365A CN114116374A CN 114116374 A CN114116374 A CN 114116374A CN 202111229365 A CN202111229365 A CN 202111229365A CN 114116374 A CN114116374 A CN 114116374A
- Authority
- CN
- China
- Prior art keywords
- alarm
- monitoring parameter
- periodically
- threshold value
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 121
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000004044 response Effects 0.000 claims description 33
- 238000012937 correction Methods 0.000 claims description 20
- 230000017525 heat dissipation Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3037—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B33/00—Constructional parts, details or accessories not provided for in the other groups of this subclass
- G11B33/14—Reducing influence of physical parameters, e.g. temperature change, moisture, dust
- G11B33/1406—Reducing the influence of the temperature
- G11B33/144—Reducing the influence of the temperature by detection, control, regulation of the temperature
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a hard disk monitoring method, which comprises the following steps: acquiring a monitoring parameter; configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters; periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter; judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value; and responding to the triggering alarm and reporting alarm information. The invention also discloses a system, a computer device and a readable storage medium. The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.
Description
Technical Field
The invention relates to the field of hard disks, in particular to a hard disk monitoring method, a hard disk monitoring system, hard disk monitoring equipment and a storage medium.
Background
With the development and wide application of technologies such as internet, cloud computing, internet of things and the like, mass data are generated at all times in human life and need to be processed and stored, and the high-speed development of information technology puts higher requirements on the performance of a storage system. Solid state disks are widely used because of their fast read/write speed and low energy consumption. With the increase of PE (tolerance degree of program & erase end write & erase), under the influence of Tcross (temperature crossing, i.e. difference between read and write temperatures), read disturb (read interference), dataretentivity (data retention), etc., the NAND may be in an unstable state, which appears as triggering more data error correction flows, and even a scenario of data decoding failure occurs, which are all the manifestations of SSD disk running abnormality, and affect the reliability of the SSD disk.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a hard disk monitoring method, including:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a hard disk monitoring system, including:
an acquisition module configured to acquire monitoring parameters;
the configuration module is configured to configure corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
the acquisition module is configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
the judging module is configured to judge whether to trigger an alarm according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and the alarm module is configured to respond to the triggering alarm and report alarm information.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The invention has one of the following beneficial technical effects: the scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a hard disk monitoring method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a hard disk monitoring system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to an aspect of the present invention, an embodiment of the present invention provides a hard disk monitoring method, as shown in fig. 1, which may include the steps of:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
Specifically, whether the total number of bad blocks on each physical LUN exceeds a standard or not can be detected: the threshold value is related to the type of the NAND particles, when a new damaged block GBB (grown bad block) is detected, whether the accumulated value of the LUN exceeds the standard or not is judged, if yes, the quality problem of the NAND monomer particles is determined, and the alarm is a serious alarm.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
Specifically, a timer (e.g., 10 minutes) may be set to periodically poll the count of the current error correction period, where the error correction period includes a first error, a read retry failure, and a soft decoding failure. And performing difference calculation in the previous round to obtain the increase of the statistics of the three types of newly added errors. For judging whether the amplification within the detection period exceeds the corresponding specific threshold values T0, T1, T2.
It should be noted that, in designing FW, the influence of the error correction process on the performance is considered, if the increase in the short time is large, it is indicated that the disc may be in an unstable state, and at this time, current basic information, such as temperature difference and retentivity of a problem block, needs to be recorded, and an alarm is reported at the same time.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Specifically, it can be detected whether there are multiple temperature sensors in the SSD disk that are abnormal: 1) whether the value of each temperature sensor exceeds the working temperature threshold value, 2) whether the temperature difference of the sensors is large, if so, the uneven heat dissipation in the disc is indicated. Considering the influence of the temperature difference on the NAND, setting a timer period (for example, 1 minute), judging the temperature difference between the current temperature and the last temperature, recording abnormal information if the temperature difference is about a specific threshold value T3, and reporting an alarm.
It should be noted that, both excessive temperature and large temperature difference may have unpredictable influence on the correctness of data in the SSD disk, and these information may be recorded as the basis for failure analysis.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a hard disk monitoring system 400, as shown in fig. 2, including:
an obtaining module 401 configured to obtain a monitoring parameter;
a configuration module 402 configured to configure a corresponding threshold, a corresponding acquisition interval, and a corresponding alarm policy for each monitoring parameter according to the monitoring parameter;
an acquisition module 403 configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
a judging module 404 configured to judge whether to trigger an alarm according to an alarm policy corresponding to each monitoring parameter, a plurality of values periodically collected, and a corresponding threshold;
the alarm module 405 is configured to report alarm information in response to a trigger alarm.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 executing the program to perform the steps of:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the following steps:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
1. A hard disk monitoring method is characterized by comprising the following steps:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
2. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
3. The method of claim 2, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
4. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
5. The method of claim 4, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
6. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
7. The method of claim 6, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
8. A hard disk monitoring system, comprising:
an acquisition module configured to acquire monitoring parameters;
the configuration module is configured to configure corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
the acquisition module is configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
the judging module is configured to judge whether to trigger an alarm according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and the alarm module is configured to respond to the triggering alarm and report alarm information.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111229365.8A CN114116374A (en) | 2021-10-21 | 2021-10-21 | Hard disk monitoring method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111229365.8A CN114116374A (en) | 2021-10-21 | 2021-10-21 | Hard disk monitoring method, system, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114116374A true CN114116374A (en) | 2022-03-01 |
Family
ID=80376425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111229365.8A Pending CN114116374A (en) | 2021-10-21 | 2021-10-21 | Hard disk monitoring method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114116374A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117555761A (en) * | 2024-01-11 | 2024-02-13 | 深圳市蓝智电子有限公司 | Mobile hard disk operation monitoring system based on Internet of things |
-
2021
- 2021-10-21 CN CN202111229365.8A patent/CN114116374A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117555761A (en) * | 2024-01-11 | 2024-02-13 | 深圳市蓝智电子有限公司 | Mobile hard disk operation monitoring system based on Internet of things |
CN117555761B (en) * | 2024-01-11 | 2024-04-02 | 深圳市蓝智电子有限公司 | Mobile hard disk operation monitoring system based on Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10198196B2 (en) | Monitoring health condition of a hard disk | |
CN109739739B (en) | Disk failure prediction method, device and storage medium | |
JP4573179B2 (en) | Performance load abnormality detection system, performance load abnormality detection method, and program | |
CN111104293A (en) | Method, apparatus and computer program product for supporting disk failure prediction | |
CN111045881A (en) | Slow disk detection method and system | |
JP5277667B2 (en) | Failure analysis system, failure analysis method, failure analysis server, and failure analysis program | |
JP5933386B2 (en) | Data management apparatus and program | |
CN111090567A (en) | Link alarm method, equipment and storage medium | |
CN111104238B (en) | CE-based memory diagnosis method, device and medium | |
WO2013151544A1 (en) | Detection of unexpected server operation through physical attribute monitoring | |
CN114116374A (en) | Hard disk monitoring method, system, device and medium | |
JP4889618B2 (en) | Data processing apparatus, data processing method, and program | |
US20160110246A1 (en) | Disk data management | |
CN113903389A (en) | Slow disk detection method and device and computer readable and writable storage medium | |
CN103502951A (en) | Operation administration system, operation administration method, and program | |
CN113590429A (en) | Server fault diagnosis method and device and electronic equipment | |
WO2022165955A1 (en) | Flash memory abnormality detection method and apparatus, and computer device and storage medium | |
Tsai et al. | A study of soft error consequences in hard disk drives | |
CN113708986B (en) | Server monitoring apparatus, method and computer-readable storage medium | |
CN112445749A (en) | Signal detection recording method, system, device and medium | |
US11138088B2 (en) | Automated identification of events associated with a performance degradation in a computer system | |
CN111858244A (en) | Hard disk monitoring method, system, device and medium | |
JP2007189644A (en) | Managing device, managing method, and program | |
CN113127274A (en) | Disk failure prediction method, device, equipment and computer storage medium | |
JP4627327B2 (en) | Abnormality judgment device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |