CN114116374A - Hard disk monitoring method, system, device and medium - Google Patents

Hard disk monitoring method, system, device and medium Download PDF

Info

Publication number
CN114116374A
CN114116374A CN202111229365.8A CN202111229365A CN114116374A CN 114116374 A CN114116374 A CN 114116374A CN 202111229365 A CN202111229365 A CN 202111229365A CN 114116374 A CN114116374 A CN 114116374A
Authority
CN
China
Prior art keywords
alarm
monitoring parameter
periodically
threshold value
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111229365.8A
Other languages
Chinese (zh)
Inventor
苏军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN202111229365.8A priority Critical patent/CN114116374A/en
Publication of CN114116374A publication Critical patent/CN114116374A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B33/00Constructional parts, details or accessories not provided for in the other groups of this subclass
    • G11B33/14Reducing influence of physical parameters, e.g. temperature change, moisture, dust
    • G11B33/1406Reducing the influence of the temperature
    • G11B33/144Reducing the influence of the temperature by detection, control, regulation of the temperature

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a hard disk monitoring method, which comprises the following steps: acquiring a monitoring parameter; configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters; periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter; judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value; and responding to the triggering alarm and reporting alarm information. The invention also discloses a system, a computer device and a readable storage medium. The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.

Description

Hard disk monitoring method, system, device and medium
Technical Field
The invention relates to the field of hard disks, in particular to a hard disk monitoring method, a hard disk monitoring system, hard disk monitoring equipment and a storage medium.
Background
With the development and wide application of technologies such as internet, cloud computing, internet of things and the like, mass data are generated at all times in human life and need to be processed and stored, and the high-speed development of information technology puts higher requirements on the performance of a storage system. Solid state disks are widely used because of their fast read/write speed and low energy consumption. With the increase of PE (tolerance degree of program & erase end write & erase), under the influence of Tcross (temperature crossing, i.e. difference between read and write temperatures), read disturb (read interference), dataretentivity (data retention), etc., the NAND may be in an unstable state, which appears as triggering more data error correction flows, and even a scenario of data decoding failure occurs, which are all the manifestations of SSD disk running abnormality, and affect the reliability of the SSD disk.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a hard disk monitoring method, including:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a hard disk monitoring system, including:
an acquisition module configured to acquire monitoring parameters;
the configuration module is configured to configure corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
the acquisition module is configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
the judging module is configured to judge whether to trigger an alarm according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and the alarm module is configured to respond to the triggering alarm and report alarm information.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The invention has one of the following beneficial technical effects: the scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a hard disk monitoring method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a hard disk monitoring system according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to an aspect of the present invention, an embodiment of the present invention provides a hard disk monitoring method, as shown in fig. 1, which may include the steps of:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
Specifically, whether the total number of bad blocks on each physical LUN exceeds a standard or not can be detected: the threshold value is related to the type of the NAND particles, when a new damaged block GBB (grown bad block) is detected, whether the accumulated value of the LUN exceeds the standard or not is judged, if yes, the quality problem of the NAND monomer particles is determined, and the alarm is a serious alarm.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
Specifically, a timer (e.g., 10 minutes) may be set to periodically poll the count of the current error correction period, where the error correction period includes a first error, a read retry failure, and a soft decoding failure. And performing difference calculation in the previous round to obtain the increase of the statistics of the three types of newly added errors. For judging whether the amplification within the detection period exceeds the corresponding specific threshold values T0, T1, T2.
It should be noted that, in designing FW, the influence of the error correction process on the performance is considered, if the increase in the short time is large, it is indicated that the disc may be in an unstable state, and at this time, current basic information, such as temperature difference and retentivity of a problem block, needs to be recorded, and an alarm is reported at the same time.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
Specifically, it can be detected whether there are multiple temperature sensors in the SSD disk that are abnormal: 1) whether the value of each temperature sensor exceeds the working temperature threshold value, 2) whether the temperature difference of the sensors is large, if so, the uneven heat dissipation in the disc is indicated. Considering the influence of the temperature difference on the NAND, setting a timer period (for example, 1 minute), judging the temperature difference between the current temperature and the last temperature, recording abnormal information if the temperature difference is about a specific threshold value T3, and reporting an alarm.
It should be noted that, both excessive temperature and large temperature difference may have unpredictable influence on the correctness of data in the SSD disk, and these information may be recorded as the basis for failure analysis.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a hard disk monitoring system 400, as shown in fig. 2, including:
an obtaining module 401 configured to obtain a monitoring parameter;
a configuration module 402 configured to configure a corresponding threshold, a corresponding acquisition interval, and a corresponding alarm policy for each monitoring parameter according to the monitoring parameter;
an acquisition module 403 configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
a judging module 404 configured to judge whether to trigger an alarm according to an alarm policy corresponding to each monitoring parameter, a plurality of values periodically collected, and a corresponding threshold;
the alarm module 405 is configured to report alarm information in response to a trigger alarm.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 executing the program to perform the steps of:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the following steps:
s1, acquiring monitoring parameters;
s2, configuring corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
s3, periodically collecting corresponding values according to the collection intervals corresponding to each monitoring parameter;
s4, judging whether to trigger alarm according to the alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and the corresponding threshold value;
and S5, responding to the trigger alarm and reporting alarm information.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
In some embodiments, the periodically acquiring the corresponding values according to the acquisition interval corresponding to each monitoring parameter further includes:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
In some embodiments, determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold, further includes:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
The scheme provided by the invention considers the diversity of the SSD disk operation, so that the current situation of the SSD is analyzed from multiple dimensions, and the alarm information is reported when the SSD disk failure is predicted. Therefore, the purpose of monitoring the running state of the SSD is achieved by adopting a mode of updating the running key data at regular time and judging whether the running of the SSD is normal according to rules, the suspicious running state of the SSD can be easily detected, and the performance and the reliability of the SSD are also judged.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A hard disk monitoring method is characterized by comprising the following steps:
acquiring a monitoring parameter;
configuring a corresponding threshold value, a corresponding acquisition interval and a corresponding alarm strategy of each monitoring parameter according to the monitoring parameters;
periodically acquiring corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
judging whether to trigger an alarm or not according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and responding to the triggering alarm and reporting alarm information.
2. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
acquiring the current number of bad blocks on each physical LUN;
and in response to detecting the newly added bad blocks, obtaining the accumulated number of bad blocks according to the current number of bad blocks on the corresponding physical LUN.
3. The method of claim 2, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
and triggering a first-level alarm in response to the current number of the bad blocks on the corresponding physical LUN, wherein the accumulated number of the bad blocks is larger than a threshold of the number of the bad blocks.
4. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
periodically acquiring the current count of each error correction type;
and calculating the increment of the count acquired in two adjacent periods by each error correction type.
5. The method of claim 4, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
and triggering the alarm of the corresponding grade according to the increment of the count in response to the increment of the count being larger than the increment threshold.
6. The method of claim 1, wherein the periodically acquiring the corresponding values according to the acquisition intervals corresponding to each monitoring parameter respectively, further comprises:
the temperature of each temperature sensor is periodically collected and the difference between each temperature sensor and the other sensors is calculated.
7. The method of claim 6, wherein determining whether to trigger an alarm according to the alarm policy corresponding to each monitoring parameter, the plurality of values periodically collected, and the corresponding threshold value further comprises:
judging whether the value of the temperature sensor reaches a temperature threshold value and judging whether the difference value is greater than a temperature difference threshold value;
reporting temperature abnormity in response to the value of the temperature sensor reaching a temperature threshold value;
and responding to the temperature difference reaching the temperature difference threshold value, and the heat dissipation in the report plate is uneven.
8. A hard disk monitoring system, comprising:
an acquisition module configured to acquire monitoring parameters;
the configuration module is configured to configure corresponding threshold values, acquisition intervals and alarm strategies corresponding to each monitoring parameter according to the monitoring parameters;
the acquisition module is configured to periodically acquire corresponding values according to the acquisition intervals corresponding to each monitoring parameter;
the judging module is configured to judge whether to trigger an alarm according to an alarm strategy corresponding to each monitoring parameter, a plurality of values acquired periodically and a corresponding threshold value;
and the alarm module is configured to respond to the triggering alarm and report alarm information.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN202111229365.8A 2021-10-21 2021-10-21 Hard disk monitoring method, system, device and medium Pending CN114116374A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111229365.8A CN114116374A (en) 2021-10-21 2021-10-21 Hard disk monitoring method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111229365.8A CN114116374A (en) 2021-10-21 2021-10-21 Hard disk monitoring method, system, device and medium

Publications (1)

Publication Number Publication Date
CN114116374A true CN114116374A (en) 2022-03-01

Family

ID=80376425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111229365.8A Pending CN114116374A (en) 2021-10-21 2021-10-21 Hard disk monitoring method, system, device and medium

Country Status (1)

Country Link
CN (1) CN114116374A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555761A (en) * 2024-01-11 2024-02-13 深圳市蓝智电子有限公司 Mobile hard disk operation monitoring system based on Internet of things

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555761A (en) * 2024-01-11 2024-02-13 深圳市蓝智电子有限公司 Mobile hard disk operation monitoring system based on Internet of things
CN117555761B (en) * 2024-01-11 2024-04-02 深圳市蓝智电子有限公司 Mobile hard disk operation monitoring system based on Internet of things

Similar Documents

Publication Publication Date Title
US10198196B2 (en) Monitoring health condition of a hard disk
CN109739739B (en) Disk failure prediction method, device and storage medium
JP4573179B2 (en) Performance load abnormality detection system, performance load abnormality detection method, and program
CN111104293A (en) Method, apparatus and computer program product for supporting disk failure prediction
CN111045881A (en) Slow disk detection method and system
JP5277667B2 (en) Failure analysis system, failure analysis method, failure analysis server, and failure analysis program
JP5933386B2 (en) Data management apparatus and program
CN111090567A (en) Link alarm method, equipment and storage medium
CN111104238B (en) CE-based memory diagnosis method, device and medium
WO2013151544A1 (en) Detection of unexpected server operation through physical attribute monitoring
CN114116374A (en) Hard disk monitoring method, system, device and medium
JP4889618B2 (en) Data processing apparatus, data processing method, and program
US20160110246A1 (en) Disk data management
CN113903389A (en) Slow disk detection method and device and computer readable and writable storage medium
CN103502951A (en) Operation administration system, operation administration method, and program
CN113590429A (en) Server fault diagnosis method and device and electronic equipment
WO2022165955A1 (en) Flash memory abnormality detection method and apparatus, and computer device and storage medium
Tsai et al. A study of soft error consequences in hard disk drives
CN113708986B (en) Server monitoring apparatus, method and computer-readable storage medium
CN112445749A (en) Signal detection recording method, system, device and medium
US11138088B2 (en) Automated identification of events associated with a performance degradation in a computer system
CN111858244A (en) Hard disk monitoring method, system, device and medium
JP2007189644A (en) Managing device, managing method, and program
CN113127274A (en) Disk failure prediction method, device, equipment and computer storage medium
JP4627327B2 (en) Abnormality judgment device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination