CN115114053A - Hard disk reliability test method, system, storage medium and equipment - Google Patents

Hard disk reliability test method, system, storage medium and equipment Download PDF

Info

Publication number
CN115114053A
CN115114053A CN202210747664.9A CN202210747664A CN115114053A CN 115114053 A CN115114053 A CN 115114053A CN 202210747664 A CN202210747664 A CN 202210747664A CN 115114053 A CN115114053 A CN 115114053A
Authority
CN
China
Prior art keywords
hard disk
test
tool
environment
sdp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210747664.9A
Other languages
Chinese (zh)
Other versions
CN115114053B (en
Inventor
葛均红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210747664.9A priority Critical patent/CN115114053B/en
Publication of CN115114053A publication Critical patent/CN115114053A/en
Application granted granted Critical
Publication of CN115114053B publication Critical patent/CN115114053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides a method, a system, a storage medium and a device for testing the reliability of a hard disk, wherein the method comprises the following steps: installing at least smartcll, sdp arm, fio components and ipmitool based on Linux, confirming the current test environment by utilizing smartcll and acquiring smart information of all hard disks; confirming that the hard disk cache is opened by utilizing the sdp arm or identifying the type of the hard disk and/or a hard disk manufacturer by utilizing the sdp arm so as to open the hard disk cache by using a corresponding tool, thereby setting the depth of the tested queue and the depth of the tested strip; according to the set queue depth and the set stripe depth, a continuous cycle test based on random writing processing of the hard disk is executed through access of smarttls to smart information and/or setting of ipmitool to a fan, and the continuous cycle test meets a given pressure test condition; the machine is powered off after the continuous cycle test is executed for the preset execution time, and the continuous cycle test of the preset execution time is executed again after the preset pause time; and monitoring whether the hard disk is locked in all test periods to test the reliability of the hard disk.

Description

Hard disk reliability test method, system, storage medium and equipment
Technical Field
The invention relates to the technical field of servers, in particular to the technical field of server testing, and specifically relates to the technical field of server hard disk reliability testing.
Background
With the continuous development of the technology in the IT field, as one of the numerous server product providers, the server is required to have stability and reliability under various application environments while satisfying the diversified demands of users under the condition that the server runs for a long time. The quality of the hard disk is particularly important for the stability and reliability of the whole server, and the poor quality of the hard disk can cause serious problems of the server and cause serious adverse effects on the client. Therefore, how to reduce the online failure rate of the main push hard disk on the client side in the test layer and intercept the hard disk shipment with poor quality when the hard disk is used on the server is a problem that the testers need to face and consider, and follows the INSPUR quality '0 defect' cultural change wave, a new test method is summarized, induced, perfected and developed in experience training, and the realization method is also an implementation mode for the testers to pursue the '0' defect in personal work.
Verifying the quality and reliability of a hard disk on a server by using a fio tool generally, testing the performance and pressure of the hard disk in a steady-state environment, and checking whether smart information and system and BMC (baseboard management controller) of the hard disk report errors or not after the test is finished; although the scripts are different, the hard disk is subjected to fio Baseline test in the steady state of the OS essentially.
The existing Test method mainly takes sg _ dd and batch processing HDD _ Fio _ Test _ Linux.sh scripts as main steps, and comprises the following steps:
1) based on a Linux system, installing fio after libaio is preassembled;
2) sh, run sg _ dd or HDD _ Fio _ Test _ linux, add the tested drive, Test 4krr, 4krw, 256ksr, 256 ksw.
Sh is tested under the condition of system stability, and generally no problem occurs, and a hard disk supplier optimizes the OD/ID of the hard disk according to the Test mode of the conventional data block. This results in problems with the hard disk itself, which conventional testing methods have not been able to detect.
In addition, non-Read-write LPC commands, such as SMART Read Data, SMART Read Log, and SMART Return Status, are transmitted by the system, but are not transmitted in real time, and the influence on the Read-write of the hard disk is staged and not sustained.
In addition, in order to protect the hard disk data, the hard disk cache is mostly in a closed state, and when the cache is opened to improve the reading and writing speed, problems are easy to occur.
Moreover, the fan rotating speed is low in a stable environment, and the influence on the hard disk is small.
Finally, while debug seizes the bus trace, using sg _ dd, each execution issues an index, which disturbs trace seizing, which is uncontrollable.
Therefore, in order to solve the above disadvantages and problems in the prior art, an optimized method for testing reliability of a hard disk needs to be provided, which is used for testing reliability of a mechanical hard disk by expanding environmental risk points in Linux, so as to avoid the situation that the prior art is uncontrollable and the problems cannot be exposed in advance due to too conventional testing methods.
Disclosure of Invention
In view of the above, the present invention provides an improved method, system, storage medium and device for testing reliability of a hard disk in a Linux system, so as to solve the above problems in the prior art.
Based on the above purpose, in one aspect, the present invention provides a method for testing reliability of a hard disk, wherein the method includes the following steps:
installing at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the smartcll tool is used for confirming the current test environment and acquiring smart information of all hard disks;
confirming that the hard disk Cache is opened by using the sdp arm tool or identifying the type and/or manufacturer of the hard disk by using the sdp arm tool so as to open the hard disk Cache by using a corresponding tool, thereby setting the depth of the tested queue and the depth of the tested strip;
according to the set queue depth and the set stripe depth, a continuous cycle test based on random writing processing of a hard disk is executed through access of the smarttll tool to smart information and/or setting of the fan through the ipmitool, wherein the continuous cycle test meets a given pressure test condition;
the continuous cycle test is used for powering off a machine after executing a preset execution time and executing the continuous cycle test of the preset execution time again after a preset pause time;
and monitoring whether the hard disk is locked in all test periods to test the reliability of the hard disk.
In some embodiments of the hard disk reliability testing method according to the present invention, the performing a continuous cycle test based on a random write process to a hard disk according to the set queue depth and stripe depth by using the smartctl tool to access smart information and/or using the ipmitool to set a fan, where the continuous cycle test satisfies a given stress test condition further includes:
the given stress test condition includes that the test satisfies:
random write processing in the WC mode is cached in the mdam;
randomly writing WCE data before FUA workload superposition; and
any refresh actions are disabled.
In some embodiments of the hard disk reliability testing method according to the present invention, the determining that the hard disk Cache is opened by using the sdp arm tool or identifying the type of the hard disk and/or the manufacturer of the hard disk by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the queue depth and the stripe depth of the test further includes:
the queue depth is set to a predetermined number of values, and the strip depth is set to a plurality of incremental values within a predetermined interval, wherein each value crosses a test for a predetermined time interval.
In some embodiments of the hard disk reliability testing method according to the present invention, the installing at least a smartcll tool, an sdp arm tool, a fio component, and an ipmitool based on Linux, where the determining a current testing environment and acquiring smart information of all hard disks by using the smartcll tool further includes:
detecting whether smart information can be checked by using a direct connection command or not by using the smart tool to judge whether the hard disk environment is a direct connection environment or not;
and in response to the fact that the smart information can be normally viewed, judging the hard disk environment to be a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
In some embodiments of the hard disk reliability testing method according to the present invention, the installing at least a smartcll tool, an sdp arm tool, a fio component, and an ipmitool based on Linux, where the determining a current testing environment and acquiring smart information of all hard disks by using the smartcll tool further includes:
in response to the fact that smart information cannot be checked through the direct connection command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to the fact that the hard disk environment is judged to be the LSIraid environment, installing a RAID card tool MegaCli64 and obtaining smart information of all hard disks by using LSI commands;
and in response to the fact that the hard disk environment is judged to be the PMCraid environment, installing a RAID card tool arcCONf and obtaining smart information of all hard disks by using a PMC command.
In some embodiments of the hard disk reliability testing method according to the present invention, the determining that the hard disk Cache is opened by using the sdp arm tool or identifying the type of the hard disk and/or the manufacturer of the hard disk by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the queue depth and the stripe depth of the test further includes:
responding to the non-direct connection environment of the hard disk environment, and setting the hard disk into a JBOD mode;
confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to be in a JBOD mode;
in response to the fact that the hard disk Cache is not opened, acquiring the interface type of the hard disk according to the smart information, and judging the type of the hard disk based on the interface type;
in response to the judgment that the hard disk is the SAS, opening a hard disk Cache by using the sdp arm tool and restarting to confirm that the hard disk Cache is opened;
responding to the judgment that the hard disk is the SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information to open a hard disk Cache, and restarting to confirm that the hard disk Cache is opened;
and setting the disk array card to be in an RAID mode in response to the confirmation that the hard disk Cache is opened.
In some embodiments of the hard disk reliability testing method according to the present invention, the performing a continuous cycle test based on a random write process to a hard disk according to the set queue depth and the set stripe depth by using the smartctl tool to access smart information and/or using the ipmitool to set a fan, where the continuous cycle test satisfies a given stress test condition further includes:
and regulating the rotating speed of the fan by utilizing the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
In another aspect of the present invention, a system for testing reliability of a hard disk is further provided, including:
the system comprises an environment configuration module, a hardware configuration module and a hardware configuration module, wherein the environment configuration module is configured to install at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, and the smartcll tool is used for confirming the current test environment and acquiring smart information of all hard disks;
the condition setting module is configured to confirm that the hard disk Cache is opened by using the sdp arm tool or identify the type and/or manufacturer of the hard disk by using the sdp arm tool so as to open the hard disk Cache by using a corresponding tool, so that the queue depth and the stripe depth of the test are set;
a continuous testing module configured to execute a continuous cycle test based on a random write process to a hard disk by accessing smart information using the smartctl tool and/or setting a fan using the ipmitool according to the set queue depth and the set stripe depth, wherein the continuous cycle test satisfies a given stress test condition;
a loop execution module configured to power off a machine after the continuous loop test is executed for a predetermined execution time and to execute the continuous loop test for the predetermined execution time again after a predetermined pause time elapses;
and the result monitoring module is configured to monitor whether the hard disk is locked in all test periods so as to test the reliability of the hard disk.
In still another aspect of the present invention, there is also provided a computer-readable storage medium storing computer program instructions, which when executed, implement any one of the above-mentioned hard disk reliability testing methods according to the present invention.
In still another aspect of the present invention, there is also provided a computer apparatus including a memory and a processor, the memory having stored therein a computer program, the computer program, when executed by the processor, performing any one of the above-described hard disk reliability test methods according to the present invention.
Based on the method, the influence of a test environment risk point on the reading and writing of the mechanical hard disk is enlarged, an application environment is mainly automatically identified, the hard disk smart information is continuously accessed under different environments, the reading and writing of the hard disk are continuously influenced, a hard disk cache is opened by automatically identifying a hard disk interface in a targeted manner, the abnormal power failure data loss risk of the hard disk is enlarged, the rotating speed of a fan is regulated, the RV influence is enhanced, the random writing test is automatically carried out on the hard disk, the abnormal power failure is carried out, the random writing forced unit load is overlapped with the written WCE data, and the problem that the conventional test method cannot be tested is effectively avoided. Based on the technical scheme, the invention at least has the following beneficial technical effects:
1. by adopting an unconventional test method, the situation that the hard disk supplier cannot test problems after optimizing the OD/ID of the hard disk aiming at a conventional test mode is avoided;
2. the influence of LPC commands on accessing the hard disk smart on the reading and writing of the hard disk is expanded, so that the influence is changed into continuity;
3. the influence of the rotating speed of the fan on the reading and writing of the hard disk is enlarged;
4. the hard disk cache is opened, the power is cut off when the hard disk is written in, the hard disk reading and writing speed is improved, the risk of abnormal power failure data loss of the hard disk is enlarged, and the influence of the test environment is continuously amplified, so that the problem is discovered and solved early.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.
In the figure:
FIG. 1 shows a schematic block diagram of an embodiment of a hard disk reliability test method according to the present invention;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of hard disk smart information capture in a hard disk reliability testing method in accordance with the present invention;
FIG. 3 is a schematic flow chart of an embodiment of opening a hard disk Cache according to the method for testing the reliability of a hard disk of the present invention;
FIG. 4 shows a schematic flow chart diagram of an embodiment of a random write test on a hard disk according to the hard disk reliability test method of the present invention;
FIG. 5 shows a schematic block diagram of an embodiment of a hard disk reliability test system according to the present invention;
FIG. 6 shows a schematic diagram of an embodiment of a computer readable storage medium implementing a hard disk reliability test method according to the invention;
fig. 7 is a schematic hardware configuration diagram of an embodiment of a computer device implementing a method for testing reliability of a hard disk according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.
In order to solve the problems in the prior art and avoid the defects, the invention provides a method for testing the reliability of a mechanical hard disk by enlarging an environmental risk point under Linux, and solves the defects that the problems cannot be exposed in advance due to uncontrollable and conventional testing methods in the prior art. In short, the concept of the invention is mainly based on the automatic identification of application environments, aiming at continuously accessing smart information of a hard disk under different environments, continuously influencing the reading and writing of the hard disk, automatically identifying the interface of the hard disk and opening the cache of the hard disk in a targeted manner, expanding the risk of abnormal power failure data loss of the hard disk, regulating and controlling the rotating speed of a fan, enhancing the RV influence, automatically carrying out random writing test on the hard disk, carrying out abnormal power failure, and overlapping the load of a random writing forced unit and the written WCE data, thereby effectively avoiding the situation that the conventional test method cannot test the problems.
For ease of programming and execution, the following three modules can be constructed according to function, so that during implementation of the method, calls can be made as required, namely module one: automatically identifying the environment, and accessing and capturing the smart information of the hard disk; and a second module: automatically identifying a hard disk interface manufacturer and opening a hard disk cache; and a third module: and combining the two modules to carry out random write test on the hard disk. The specific contents of the above modules one, two and three are specifically set forth in the following text.
Therefore, in a first aspect of the present invention, a method 100 for testing reliability of a hard disk is provided. Fig. 3 shows a schematic block diagram of an embodiment of a hard disk reliability test method according to the present invention. In the embodiment shown in fig. 1, the method comprises:
step S110: installing at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the smartcll tool is used for confirming the current test environment and acquiring smart information of all hard disks;
step S120: confirming that the hard disk Cache is opened by using the sdp arm tool or identifying the type and/or manufacturer of the hard disk by using the sdp arm tool so as to open the hard disk Cache by using a corresponding tool, thereby setting the depth of the tested queue and the depth of the tested strip;
step S130: according to the set queue depth and the set stripe depth, a continuous cycle test based on random writing processing of a hard disk is executed through access of the smarttll tool to smart information and/or setting of the fan through the ipmitool, wherein the continuous cycle test meets a given pressure test condition;
step S140: the continuous cycle test is used for powering off a machine after executing a preset execution time and executing the continuous cycle test of the preset execution time again after a preset pause time;
step S150: and monitoring whether the hard disk is locked in all test periods to test the reliability of the hard disk.
In summary, aiming at the problems in the prior art, the reliability of the mechanical hard disk is tested by expanding the environmental risk point under Linux according to the invention. In order to expand the environmental risk point and automatically execute the corresponding test process, the test environment needs to be configured first, and therefore, in step S110, at least a smartctl tool, an sdp arm tool, a fio component, and an ipmitool are installed based on Linux, where the smartctl tool is used to confirm the current test environment and obtain smart information of all hard disks. Here, preferably, step S110 mainly involves executing the aforementioned module three and its call to the module one.
Subsequently, after the test environment configuration is completed, the test conditions also need to be set. Therefore, in step S120, it is confirmed that the hard disk Cache is opened by using the sdp arm tool, or the hard disk type and/or the hard disk manufacturer are identified by using the sdp arm tool, so as to open the hard disk Cache by using the corresponding tool, thereby setting the queue depth and the stripe depth of the test. Here, preferably, step S120 mainly involves executing the aforementioned module three and its call to the module two.
On this basis, according to the queue depth and the stripe depth set in step S120, a continuous loop test based on a random write process to the hard disk is performed in step S130 by accessing smart information with the smartcll tool and/or setting a fan with the ipmitool, where the continuous loop test satisfies a given stress test condition. Here, preferably, step S130 includes executing the aforementioned module three and its call to the module one.
To obtain a sufficiently efficient test, a large number of iterations of the test and adjustments as required are often required. Thus, in step S140, the continuous loop test performs the continuous loop test at a predetermined execution time, for example, 12 hours (h), after which the machine is powered off and again at the predetermined execution time, for example, 12 hours (h), after a predetermined pause time, for example, 30 seconds (S), has elapsed.
Finally, in step S150, it is monitored whether the hard disk is locked during all the testing periods to test the reliability of the hard disk. Specifically, in the process of the test performed in the foregoing steps S130 to S140, there may be a case where invalid address information is written into the NVC, and the hard disk is locked after the NVC content is read by the hard disk after the restart, and the smart information access and the fan speed continuously affect the performance of the hard disk, so that the test performed in a disadvantageous environment may expose more problems. Therefore, whether the hard disk is locked or not is monitored in all test periods, whether the hard disk has the problems or not can be judged, and the reliability of the hard disk can be further evaluated.
The process according to the invention is further explained below with reference to specific examples. It is to be noted that the following specific examples are intended to illustrate the implementation of the method according to the invention and should not be considered as limiting the method according to the invention.
According to the method, the influence of the test environment risk point on the reading and writing of the mechanical hard disk is expanded under Linux, the application environment needs to be automatically identified, the hard disk cache is opened by automatically identifying the hard disk interface in a pertinence manner aiming at the continuous access of the smart information of the hard disk under different environments, the rotating speed of a fan is regulated and controlled, and the random writing test is automatically carried out on the hard disk according to the strip depth and the queue depth.
Firstly, the function of the module I is to automatically identify the environment and access and capture the smart letter of the hard disk. Fig. 2 shows a schematic flow chart of an embodiment of the capture of the smart information of the hard disk, i.e. the above module one, according to the hard disk reliability test method of the present invention. The module I comprises the following aspects.
And 1, installing a Linux system and installing a smartcll tool.
And a module I2, checking whether all hard disks can normally acquire the smart information by using the smart-a/dev/sd disk character.
And a first module 3 can acquire smart information as a direct connection environment, can use the command of 2 to capture, and exits.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S110: installing at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the step of confirming the current test environment by using the smartcll tool and acquiring smart information of all hard disks further comprises the following steps:
detecting whether smart information can be checked by using a direct connection command or not by using the smart tool to judge whether the hard disk environment is a direct connection environment or not;
and in response to the fact that the smart information can be normally viewed, judging the hard disk environment to be a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
And a first module 4. if smart information cannot be acquired, judging whether the LSI RAID mode ($ hasLsi) is the LSI RAID mode or not by using lspci | grep-E and LSI | AVAGO', and installing a MegaCli64 tool in the LSI environment.
Module one 5 was obtained using MegaCli64-pdlist-aall | awk-F ": '/DeviceId/{ printf ("% s, ", $2) }/PD Type/{ print $2}'
megaraid_device($mega_device)。
Module one 6.Cat $ mega _ device | sed-ne "$ j '" p' obtains the line number $ line.
Module one 7.echo $ line | awk-F ",' { print $1}, obtains $ device _ id from $ line.
And a first module 8 integrates the grabbing information and obtains the smart information of the hard disk by using smart-a-d media identifier and device _ id commands.
And a first module 9, judging whether the PMC ("$ hasADAPTT) is determined by using lspci | grep-E 'Adaptec', and installing a RAID card tool arcCONf.
Block one 10. use "$ hasADAPT | wc-l to derive the PMC raid card number $ raid _ num.
Module one 11, use number $ raid _ num of raid card to derive drive letter under different raid
$diskname,./arcconf getconfig$raid_num|grep-i"disk name"|head-n 1|awk-F:'{print$2}'。
A first module 12, obtaining the slot number of the hard disk by using the number $ raid _ num of the raid card
$slot,./arcconf getconfig$raid_num|grep-i report|grep-i slot|awk-F'('
'{print$1}'|awk'{print$NF}'。
Module one 13. get the sg number $ sg _ num, lssci-g | grep $ dishname | awk '{ print $ NF }', carried using the RAID card $ dishname drive.
And a first module 14 integrates the grabbing information and obtains the smart information of the hard disk by using a smart tll-d cciss and a slot-a $ sg _ num command.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S110: installing at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the step of confirming the current test environment by using the smartcll tool and acquiring smart information of all hard disks further comprises the following steps:
in response to the fact that smart information cannot be checked through the direct connection command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to the fact that the hard disk environment is judged to be the LSIraid environment, installing a RAID card tool MegaCli64 and obtaining smart information of all hard disks by using LSI commands;
and in response to the fact that the hard disk environment is judged to be the PMCraid environment, installing a RAID card tool arcCONf and obtaining smart information of all hard disks by using a PMC command.
Further, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S120: confirming that the hard disk Cache is opened by using the sdp arm tool or identifying the type and/or manufacturer of the hard disk by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the queue depth and the stripe depth of the test further comprises:
responding to the non-direct connection environment of the hard disk environment, and setting the hard disk into a JBOD mode;
confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to be in a JBOD mode;
in response to the fact that the hard disk Cache is not opened, acquiring the interface type of the hard disk according to the smart information, and judging the type of the hard disk based on the interface type;
in response to the judgment that the hard disk is the SAS, opening the hard disk Cache by using the sdp arm tool and restarting to confirm that the hard disk Cache is opened;
responding to the judgment that the hard disk is the SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information to open a hard disk Cache, and restarting to confirm that the hard disk Cache is opened;
and setting the disk array card to be in an RAID mode in response to the confirmation that the hard disk Cache is opened.
Therefore, the present embodiment mainly relates to the function of the second module and the call of the third module to the second module.
Specifically, the second module has the functions of automatically identifying the manufacturer of the hard disk interface and opening the hard disk cache. Fig. 3 shows a schematic flowchart of an embodiment of opening a hard disk Cache, that is, the above module two, according to the hard disk reliability test method of the present invention. The second module comprises the following aspects.
And a second module 1, installing an sdp arm under the system.
And a second module 2, according to the first module, confirming the current test environment. The non-direct connection environment requires setting the RAID card to JBOD mode.
And a second module 3, checking by using an sdp arm tool sdp arm-S disk letter, judging whether the WCE of the hard disk is equal to 1, and if the WCE of the hard disk is equal to 1, opening the cache of the hard disk, and exiting the second module.
A second module 4, wherein the WCE of the hard disk is 0, and awk-F ':'/Transport protocol:/{ gsub (//, ", $2) in the Smart information in the first module is captured; and obtaining a $ print $2 field to obtain a $ hd _ type of a hard disk interface, judging whether the SAS interface is the SAS interface or not, if the SAS interface is the SAS interface, opening the cache by using an sdp arm tool-S WCE (1-S disk drive identifier), restarting the system, and confirming that the cache is opened by using the sdp arm-S disk drive identifier again. And recovering the RAID card to a RAID mode, wherein the direct connection mode does not need to be modified, and the module exits.
And a second module 5, according to the step 4, the hard disk is an SATA interface. Acquiring a hard disk manufacturer by a Device Model in smart information to be captured, if the disk is a Hitachi HGST or WDC, installing an sg _ raw tool, and opening a cache by using an sg _ raw-s 512-i WCE. If the disk is taken as Seagate, a changeWCstate tool is installed, and a cache is opened by using a/changeWCstate. If the Toshiba is Toshiba, using smartctl-swcache-sct, on, p-drive symbol to open the cache; the system restarts and confirms again that the cache is opened by using the sdp arm-S drive letter. And recovering the RAID card into an RAID mode, wherein the direct connection mode does not need to be modified, and the module exits.
And the third module has the function of combining the first module and the second module to carry out random write test on the hard disk. Fig. 4 shows a schematic flow chart of an embodiment of random write testing, namely the above module three, on a hard disk according to the hard disk reliability testing method of the present invention. The third module comprises the following aspects.
And a third module 1, installing the fio and the dependency package thereof, namely the ipmitool.
And a third module 2, opening the hard disk cache according to the second module before testing.
And a third module 3, setting the script queue depth to be TEST _ QDEPTH equal to 8 and 16, setting the strip depth to be 512 b-2 MB, and taking a plurality of increasing values for 15min of cross TEST each time.
In some embodiments of the method 100 for testing reliability of a hard disk according to the present invention, the step S120: confirming that the hard disk Cache is opened by using the sdp arm tool or identifying the type and/or manufacturer of the hard disk by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the queue depth and stripe depth of the test further comprises:
the queue depth is set to a predetermined number of values, and the strip depth is set to a plurality of incremental values within a predetermined interval, wherein each value crosses a test for a predetermined time interval.
And a third module 4, circularly and continuously performing smart information access on the hard disk according to the first module during the test period.
And a third module 5, setting the fan rotating speed to be automatically regulated by using ipmitool every 30min of test in the test period, wherein the preset incremental values are preferably 30%, 85% and 100%.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S130: executing a continuous cycle test based on random writing processing of a hard disk according to the set queue depth and the set stripe depth through the access of the smarttll tool to smart information and/or the setting of the ipmitool to the fan, wherein the continuous cycle test meets the given pressure test condition and further comprises the following steps: and regulating the rotating speed of the fan by utilizing the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
Module three 6. the following three points are reached in the test process:
a. the Random write process in the WC mode is buffered in mDRAM (Random Writes in WCE mode that is cached in mDRAM)
b. Writing randomly the WCE data before FUA workload overlay (Random write FUA workload overlapping WCE data from previous)
c. Do not do any refresh action;
to this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S130: executing a continuous cycle test based on random writing processing of a hard disk according to the set queue depth and the set stripe depth by utilizing the smartctl tool to access smart information and/or utilizing the ipmitool to set a fan, wherein the continuous cycle test meets the given pressure test condition and further comprises:
the given stress test condition includes that the test satisfies:
random write processing in the WC mode is cached in the mdam;
randomly writing WCE data before FUA workload superposition; and
any refresh actions are disabled.
And a third module 7, continuously testing the preset execution time, such as 12h, and powering off.
And a third module 8, waking up the machine after a preset pause time, such as 30s, and automatically performing the test by the script for a preset execution time, such as 12 h.
And a third module 9. in this case, invalid address information may be written into the NVC, the hard disk is locked after the NVC content is read by the hard disk after the restart, smart information access and the fan speed may continuously affect the performance of the hard disk, and further problems may be exposed when the test is performed in a disadvantageous environment.
According to the embodiment of the invention, the influence of the test environment risk point on the reading and writing of the mechanical hard disk is enlarged under Linux, the application environment is automatically identified, the hard disk cache is opened by automatically identifying the hard disk interface pertinently aiming at continuously accessing the smart information of the hard disk under different environments, the rotating speed of a fan is regulated and controlled, and the random writing test is automatically carried out on the hard disk according to the strip depth and the queue depth. Therefore, according to the invention, an unconventional test method is adopted, so that the situation that the hard disk supplier cannot test problems after optimizing the OD/ID of the hard disk aiming at a conventional test mode is avoided; the influence of the LPC command to access the hard disk smart on the reading and writing of the hard disk is expanded, so that the influence becomes persistent; the influence of the rotating speed of the fan on the reading and writing of the hard disk is enlarged; the hard disk cache is opened, and power is cut off when the hard disk is written in, so that the hard disk reading and writing speed is improved, the risk of abnormal power failure data loss of the hard disk is enlarged, and the influence of a test environment is continuously amplified, so that the problem is discovered and solved early.
In a second aspect of the present invention, a system 200 for testing reliability of a hard disk is also provided. FIG. 5 shows a schematic block diagram of an embodiment of a hard disk reliability test system 200 according to the present invention. As shown in fig. 5, the system includes:
the environment configuration module 210 is configured to install at least a smartctl tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the smartctl tool is used for confirming a current test environment and acquiring smart information of all hard disks;
a condition setting module 220, wherein the condition setting module 220 is configured to confirm that the hard disk Cache is opened by using the sdp arm tool or identify the type and/or manufacturer of the hard disk by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the depth of the tested queue and the depth of the tested stripe;
a persistence test module 230, said persistence test module 230 configured to execute a persistence cycle test based on a random write process to a hard disk by an access to smart information by said smartctl tool and/or a setting of a fan by said ipmitool according to the set queue depth and stripe depth, wherein said persistence cycle test satisfies a given stress test condition;
a loop execution module 240, the loop execution module 240 configured to power down a machine after the persistent loop test executes a predetermined execution time and to execute the persistent loop test of the predetermined execution time again after a predetermined pause time elapses;
a result monitoring module 250, wherein the result monitoring module 250 is configured to monitor whether the hard disk is locked in all test periods to test the reliability of the hard disk.
In a third aspect of the embodiment of the present invention, a computer-readable storage medium is further provided, and fig. 6 is a schematic diagram illustrating the computer-readable storage medium of the hard disk reliability testing method according to the embodiment of the present invention. As shown in fig. 6, the computer-readable storage medium 300 stores computer program instructions 310, the computer program instructions 310 being executable by a processor. The computer program instructions 310, when executed, implement the method of any of the embodiments described above.
It should be understood that all embodiments, features and advantages set forth above with respect to the method for testing reliability of a hard disk according to the present invention apply equally, without conflict with each other, to the system for testing reliability of a hard disk and to the storage medium according to the present invention.
In a fourth aspect of the embodiments of the present invention, there is further provided a computer device 400, comprising a memory 420 and a processor 410, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the method of any one of the above embodiments.
Fig. 7 is a schematic diagram of a hardware structure of an embodiment of a computer device for executing a method for testing reliability of a hard disk according to the present invention. Taking the computer device 400 shown in fig. 7 as an example, the computer device includes a processor 410 and a memory 420, and may further include: an input device 430 and an output device 440. The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by the bus connection in fig. 7. Input device 430 may receive input numeric or character information and generate signal inputs related to a hard disk reliability test. The output device 440 may include a display device such as a display screen.
The memory 420 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the resource monitoring method in the embodiment of the present application. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the resource monitoring method, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to local modules via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 410 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 420, that is, implements the method of the above-described method embodiment.
Finally, it should be noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for testing the reliability of a hard disk is characterized by comprising the following steps:
installing at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, wherein the smartcll tool is used for confirming the current test environment and acquiring smart information of all hard disks;
confirming that the hard disk Cache is opened by using the sdp arm tool or identifying the type and/or manufacturer of the hard disk by using the sdp arm tool so as to open the hard disk Cache by using a corresponding tool, thereby setting the depth of the tested queue and the depth of the tested strip;
according to the set queue depth and the set stripe depth, a continuous cycle test based on random writing processing of a hard disk is executed through access of the smarttll tool to smart information and/or setting of the fan through the ipmitool, wherein the continuous cycle test meets a given pressure test condition;
the continuous cycle test is used for powering off a machine after executing a preset execution time and executing the continuous cycle test of the preset execution time again after a preset pause time;
and monitoring whether the hard disk is locked in all test periods to test the reliability of the hard disk.
2. The method according to claim 1, wherein the performing of the persistent loop test based on the random writing process to the hard disk according to the set queue depth and the set stripe depth by using the smartctl tool to access smart information and/or using the ipmitool to set the fan, wherein the persistent loop test satisfies given stress test conditions further comprises:
the given stress test condition includes that the test satisfies:
random write processing in the WC mode is cached in the mdam;
randomly writing WCE data before FUA workload superposition; and
any refresh actions are disabled.
3. The method of claim 1, wherein the confirming that the hard disk Cache is opened by using the sdp arm tool or identifying a hard disk type and/or a hard disk manufacturer by using the sdp arm tool to open the hard disk Cache by using the corresponding tool, so as to set the queue depth and the stripe depth of the test further comprises:
the queue depth is set to a predetermined number of values, and the strip depth is set to a plurality of incremental values within a predetermined interval, wherein each value crosses a test for a predetermined time interval.
4. The method according to any one of claims 1 to 3, wherein at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool are installed based on Linux, wherein the step of confirming the current test environment and acquiring smart information of all hard disks by using the smartcll tool further comprises the following steps:
using the smartclt tool to detect whether a direct connection command can be used for checking smart information to judge whether the hard disk environment is a direct connection environment;
and in response to the fact that the smart information can be normally viewed, judging the hard disk environment to be a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
5. The method according to claim 4, wherein at least a smartcll tool, an sdp arm tool, and a fio component and an ipmitool are installed based on Linux, wherein the step of confirming the current test environment and acquiring smart information of all hard disks by using the smartcll tool further comprises the steps of:
in response to the fact that smart information cannot be checked through the direct connection command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to the fact that the hard disk environment is judged to be the LSIraid environment, installing a RAID card tool MegaCli64 and obtaining smart information of all hard disks by using LSI commands;
and in response to the fact that the hard disk environment is judged to be the PMCraid environment, installing a RAID card tool arcCONf and obtaining smart information of all hard disks by using a PMC command.
6. The method of claim 5, wherein the confirming that the hard disk Cache is opened by using the sdp arm tool or identifying a hard disk type and/or a hard disk manufacturer by using the sdp arm tool to open the hard disk Cache by using a corresponding tool, so as to set the queue depth and the stripe depth of the test further comprises:
responding to the non-direct connection environment of the hard disk environment, and setting the hard disk into a JBOD mode;
confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to be in a JBOD mode;
in response to the fact that the hard disk Cache is not opened, acquiring the interface type of the hard disk according to the smart information, and judging the type of the hard disk based on the interface type;
in response to the judgment that the hard disk is the SAS, opening the hard disk Cache by using the sdp arm tool and restarting to confirm that the hard disk Cache is opened;
responding to the judgment that the hard disk is the SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information to open a hard disk Cache, and restarting to confirm that the hard disk Cache is opened;
and setting the disk array card to be in an RAID mode in response to the confirmation that the hard disk Cache is opened.
7. The method according to any one of claims 1 to 3, wherein the performing of the continuous loop test based on the random writing process to the hard disk according to the set queue depth and the set stripe depth by using the smartcll tool to access smart information and/or using the ipmitool to set a fan, wherein the continuous loop test satisfies given stress test conditions further comprises:
and regulating the rotating speed of the fan by utilizing the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
8. A hard disk reliability testing system, comprising:
the system comprises an environment configuration module, a hardware configuration module and a hardware configuration module, wherein the environment configuration module is configured to install at least a smartcll tool, an sdp arm tool, a fio component and an ipmitool based on Linux, and the smartcll tool is used for confirming the current test environment and acquiring smart information of all hard disks;
the condition setting module is configured to confirm that the hard disk Cache is opened by using the sdp arm tool or identify the type and/or manufacturer of the hard disk by using the sdp arm tool so as to open the hard disk Cache by using a corresponding tool, so that the queue depth and the stripe depth of the test are set;
a continuous testing module configured to execute a continuous cycle test based on a random write process to a hard disk by accessing smart information using the smartctl tool and/or setting a fan using the ipmitool according to the set queue depth and the set stripe depth, wherein the continuous cycle test satisfies a given stress test condition;
a loop execution module configured to power off a machine after the continuous loop test is executed for a predetermined execution time and to execute the continuous loop test for the predetermined execution time again after a predetermined pause time elapses;
and the result monitoring module is configured to monitor whether the hard disk is locked in all test periods so as to test the reliability of the hard disk.
9. A computer-readable storage medium, having stored thereon computer program instructions which, when executed, implement the hard disk reliability testing method according to any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, characterized in that the memory has stored therein a computer program which, when executed by the processor, performs the hard disk reliability test method according to any one of claims 1 to 7.
CN202210747664.9A 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment Active CN115114053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210747664.9A CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210747664.9A CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN115114053A true CN115114053A (en) 2022-09-27
CN115114053B CN115114053B (en) 2024-05-14

Family

ID=83330224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210747664.9A Active CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN115114053B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899119A (en) * 2015-05-21 2015-09-09 浪潮电子信息产业股份有限公司 Method for automatically detecting hard disk abnormality
CN110413463A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of SMART information inspection method of hard disk
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899119A (en) * 2015-05-21 2015-09-09 浪潮电子信息产业股份有限公司 Method for automatically detecting hard disk abnormality
CN110413463A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of SMART information inspection method of hard disk
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium

Also Published As

Publication number Publication date
CN115114053B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
US9146839B2 (en) Method for pre-testing software compatibility and system thereof
WO2022160756A1 (en) Server fault positioning method, apparatus and system, and computer-readable storage medium
CN110992992B (en) Hard disk test method, device and storage medium
US7356744B2 (en) Method and system for optimizing testing of memory stores
CN109587331B (en) Method and system for automatically repairing cloud mobile phone fault
CN110457907B (en) Firmware program detection method and device
WO2023115999A1 (en) Device state monitoring method, apparatus, and device, and computer-readable storage medium
US10802847B1 (en) System and method for reproducing and resolving application errors
US20130067288A1 (en) Cooperative Client and Server Logging
CN110609778A (en) Method and system for storing server downtime log
JP2019204488A (en) Update of firmware by remote utility
CN114816022B (en) Method, system and storage medium for monitoring server power supply abnormality
CN109741786A (en) A kind of solid state hard disk monitoring method, device and equipment
TW201516665A (en) System and method for detecting system error of server
CN115114053A (en) Hard disk reliability test method, system, storage medium and equipment
CN115567392B (en) Automatic deployment upgrading method for customer internal service system
JP2013061841A (en) Information processing device and test method for information processing device
CN113687867B (en) Shutdown method, system, equipment and storage medium of cloud platform cluster
CN115080132A (en) Information processing method, information processing apparatus, server, and storage medium
TW201015296A (en) Method for auto-testing environment variable setting
JP4635993B2 (en) Startup diagnostic method, startup diagnostic method and program
JP4189854B2 (en) Failure verification operation apparatus and failure verification method
JP7291107B2 (en) Electronic computer, reproduction test method and program
CN111367728A (en) Memory space monitoring and cleaning test method and system
CN113688017B (en) Automatic abnormality testing method and device for multi-node BeeGFS file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant