CN115114053B - Hard disk reliability test method, system, storage medium and equipment - Google Patents

Hard disk reliability test method, system, storage medium and equipment Download PDF

Info

Publication number
CN115114053B
CN115114053B CN202210747664.9A CN202210747664A CN115114053B CN 115114053 B CN115114053 B CN 115114053B CN 202210747664 A CN202210747664 A CN 202210747664A CN 115114053 B CN115114053 B CN 115114053B
Authority
CN
China
Prior art keywords
hard disk
test
tool
sdparm
smartctl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210747664.9A
Other languages
Chinese (zh)
Other versions
CN115114053A (en
Inventor
葛均红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210747664.9A priority Critical patent/CN115114053B/en
Publication of CN115114053A publication Critical patent/CN115114053A/en
Application granted granted Critical
Publication of CN115114053B publication Critical patent/CN115114053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides a method, a system, a storage medium and equipment for testing the reliability of a hard disk, wherein the method comprises the following steps: based on Linux, at least smartctl, sdparm, a fio component and ipmitool are installed, the current testing environment is confirmed by smartctl, and smart information of all hard disks is obtained; utilizing sdparm to confirm that the hard disk cache is opened or utilizing sdparm to identify the type of the hard disk and/or the manufacturer of the hard disk so as to use a corresponding tool to open the hard disk cache, thereby setting the queue depth and the stripe depth of the test; performing a continuous loop test based on random write processing to the hard disk by using smartctl access to smart information and/or using ipmitool setting to fans according to the set queue depth and stripe depth, the continuous loop test satisfying a given pressure test condition; powering off the machine after the continuous loop test is executed for a predetermined execution time and executing the continuous loop test for the predetermined execution time again after a predetermined pause time elapses; and monitoring whether the hard disk is locked or not in all test periods so as to test the reliability of the hard disk.

Description

Hard disk reliability test method, system, storage medium and equipment
Technical Field
The invention relates to the technical field of servers, in particular to the technical field of server testing, and particularly relates to the technical field of server hard disk reliability testing.
Background
With the continuous development of IT field technologies, as one of numerous server product providers, stability and reliability of servers in various application environments are ensured while satisfying the needs of users for a long time. The quality of the hard disk is particularly important to the stability and reliability of the whole server, and the poor quality of the hard disk may cause serious problems of the server, and serious adverse effects are caused at the client. Therefore, for the use of the hard disk on the server, how to reduce the online failure rate of the main push hard disk on the client side in the test layer and intercept the hard disk shipment with poor quality is a problem that the tester must face and consider, but the culture of the '0 defect' immediately follows INSPUR quality changes the tide, and the method is summarized, generalized, perfected and developed in the experience teaching and training, and is also an implementation mode of the '0' defect pursued by the personal work of the tester.
Verifying the quality and reliability of a hard disk on a server, usually using a fio tool, testing the performance and pressure of the hard disk in a steady-state environment, and checking whether the hard disk has smart information for error reporting or not after the test is completed, and checking whether a system and a BMC have error reporting or not; although the scripts differ, it is essentially the Baseline test of fio on the hard disk in OS steady state.
The existing Test method mainly takes sg_dd and batch HDD_Fio_test_Linux.sh scripts as main steps as follows:
1) Based on a Linux system, installing fio after pre-installing libaio;
2) Run sg_dd or hdd_fio_test_linux.sh, add the tested disk drive, and perform the tests of 4krr, 4krw, 256ksr, 256 ksw.
However, hdd_fio_test_linux.sh is more common, but it is tested under system stability conditions, generally without problems, and the hard disk vendor optimizes the OD/ID of the hard disk for this conventional data block Test pattern. This causes problems for the hard disk itself, which are not detected by this conventional test method.
In addition, the non-read/write LPC commands such as SMART READ DATA, SMART READ Log, SMART Return Status are sent systematically, but are not sent in real time, and the effect on the hard disk read/write is periodic and not persistent.
In addition, in order to protect hard disk data, the hard disk cache is in a closed state, and when the cache is opened to improve the reading and writing speed, problems are easy to occur.
Furthermore, the fan rotating speed is low in a stable environment, and the influence on the hard disk is small.
Finally, the bus trace is grabbed at debug, with sg_dd, and indentify is sent out each time execution is performed, disturbing trace grabbing, which is not controllable.
Therefore, in order to solve the above-mentioned drawbacks and problems in the prior art, an optimized method for testing reliability of a hard disk needs to be provided, and the reliability of a mechanical hard disk is tested at an environmental risk point under Linux, so as to avoid the situation that the prior art is uncontrollable, and the problem cannot be exposed in advance due to the too conventional testing method.
Disclosure of Invention
In view of the above, the present invention is directed to an improved method, system, storage medium and device for testing the reliability of a hard disk in a Linux system, so as to solve the above-mentioned problems in the prior art.
Based on the above object, in one aspect, the present invention provides a method for testing reliability of a hard disk, wherein the method comprises the following steps:
Installing at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein the smartctl tools are utilized to confirm the current test environment and acquire smart information of all hard disks;
Confirming that the hard disk Cache is opened by utilizing the sdparm tool or identifying the type of the hard disk and/or a hard disk manufacturer by utilizing the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the tested queue depth and the strip depth;
performing a continuous loop test based on random write processing to a hard disk by using the smartctl tool to access smart information and/or using the ipmitool to set a fan according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition;
Powering off the machine after the continuous loop test is executed for a predetermined execution time and executing the continuous loop test for the predetermined execution time again after a predetermined pause time elapses;
And monitoring whether the hard disk is locked or not in all test periods so as to test the reliability of the hard disk.
In some embodiments of the method for testing reliability of a hard disk according to the present invention, the performing a continuous loop test based on random write processing to the hard disk by using the smartctl tool to access smart information and/or using the ipmitool to set a fan according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition further comprises:
the given pressure test conditions include test satisfaction:
Random write processing in WC mode is cached in mDRAM;
Randomly writing WCE data before FUA workload superposition; and
Any refresh action is disabled.
In some embodiments of the method for testing the reliability of a hard disk according to the present invention, the determining, by the sdparm tool, that the hard disk Cache is opened or identifying, by the sdparm tool, a hard disk type and/or a hard disk vendor to open the hard disk Cache by using the corresponding tool, so as to set a queue depth and a stripe depth of the test further includes:
The queue depth is set to a predetermined number of values and the stripe depth is set to a plurality of increasing values within a predetermined interval, wherein each value is cross tested for a predetermined time interval.
In some embodiments of the hard disk reliability test method according to the present invention, the Linux-based installation of at least smartctl tools, sdparm tools, and fio components and ipmitool, wherein the validating the current test environment and obtaining smart information of all hard disks using the smartctl tools further comprises:
detecting whether the smart information can be checked by using the smartctl tool to judge whether the hard disk environment is a direct-connection environment or not;
And in response to the normal viewing of the smart information, judging the hard disk environment as a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
In some embodiments of the hard disk reliability test method according to the present invention, the Linux-based installation of at least smartctl tools, sdparm tools, and fio components and ipmitool, wherein the validating the current test environment and obtaining smart information of all hard disks using the smartctl tools further comprises:
In response to being unable to view smart information by using the direct command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to determining that the hard disk environment is LSIraid environments, installing the RAID card tool MegaCli64 and obtaining smart information of all hard disks using LSI commands;
In response to determining that the hard disk environment is PMCraid environments, RAID card utility arcconf is installed and the smart information for all hard disks is obtained using the PMC commands.
In some embodiments of the method for testing the reliability of a hard disk according to the present invention, the determining, by the sdparm tool, that the hard disk Cache is opened or identifying, by the sdparm tool, a hard disk type and/or a hard disk vendor to open the hard disk Cache by using the corresponding tool, so as to set a queue depth and a stripe depth of the test further includes:
setting a hard disk to JBOD mode in response to the hard disk environment being a non-direct connection environment;
Confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to a JBOD mode;
responding to the confirmation that the hard disk Cache is not opened, acquiring the type of the hard disk interface according to the smart information, and judging the type of the hard disk based on the type of the interface;
responding to the judgment that the hard disk is SAS, opening the hard disk Cache by utilizing the sdparm tool and restarting to confirm that the hard disk Cache is opened;
Responding to the judgment that the hard disk is SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information, opening a hard disk Cache and restarting to confirm that the hard disk Cache is opened;
And in response to confirming that the hard disk Cache is opened, setting the disk array card into a RAID mode.
In some embodiments of the method for testing reliability of a hard disk according to the present invention, the performing a continuous loop test based on random write processing to the hard disk by using the smartctl tool to access smart information and/or using the ipmitool to set a fan according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition further comprises:
And regulating and controlling the rotating speed of the fan by using the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
In another aspect of the present invention, there is also provided a hard disk reliability test system, including:
The environment configuration module is configured to install at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein the smartctl tools are utilized to confirm the current test environment and acquire smart information of all hard disks;
The condition setting module is configured to confirm that the hard disk Cache is opened by utilizing the sdparm tool or identify the type of the hard disk and/or a hard disk manufacturer by utilizing the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the queue depth and the stripe depth of the test;
A continuous test module configured to perform a continuous loop test based on random write processing to a hard disk by accessing smart information with the smartctl tool and/or setting a fan with the ipmitool in accordance with the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition;
A cycle execution module configured to power down a machine after the continuous cycle test executes a predetermined execution time and to execute the continuous cycle test for the predetermined execution time again after a predetermined pause time has elapsed;
And the result monitoring module is configured to monitor whether the hard disk is in a locking condition or not in all test periods so as to test the reliability of the hard disk.
In yet another aspect of the present invention, there is also provided a computer readable storage medium storing computer program instructions that when executed implement any of the above methods for testing the reliability of a hard disk according to the present invention.
In yet another aspect of the present invention, there is also provided a computer device including a memory and a processor, the memory storing a computer program which, when executed by the processor, performs any one of the above methods for testing the reliability of a hard disk according to the present invention.
Based on the method, the influence of the risk points of the testing environment on the read-write of the mechanical hard disk is enlarged, the automatic identification application environment is mainly carried out, the hard disk smart information is continuously accessed under different environments, the hard disk read-write is continuously influenced, the hard disk cache is automatically identified, the abnormal power failure data loss risk of the hard disk is enlarged, the fan rotating speed is regulated and controlled, the RV influence is enhanced, the random write test is automatically carried out on the hard disk, the abnormal power failure is carried out, the random write forced unit load is overlapped with the written WCE data, and the situation that the conventional test method test cannot cause problems is effectively avoided. Based on the technical scheme, the invention has at least the following beneficial technical effects:
1. adopting an unconventional testing method, avoiding the situation that the hard disk supplier does not have problems after optimizing the OD/ID of the hard disk aiming at the conventional testing mode;
2. the influence of LPC command access hard disk smart on hard disk read-write is enlarged, so that the influence is continuous;
3. enlarging the influence of the rotating speed of the fan on the reading and writing of the hard disk;
4. the method comprises the steps of opening the hard disk cache, powering off the hard disk when writing, improving the hard disk read-write speed, expanding the risk of abnormal power failure data loss of the hard disk, and continuously amplifying the influence of the testing environment, so that the problems are found and solved early.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
In the figure:
FIG. 1 shows a schematic block diagram of an embodiment of a hard disk reliability test method according to the present invention;
FIG. 2 shows a schematic flow chart of an embodiment of hard disk smart information grabbing of a hard disk reliability test method in accordance with the present invention;
FIG. 3 shows a schematic flow chart diagram of an embodiment of an open hard disk Cache according to a hard disk reliability test method of the present invention;
FIG. 4 shows a schematic flow chart of an embodiment of random write testing of a hard disk according to the hard disk reliability test method of the present invention;
FIG. 5 shows a schematic block diagram of an embodiment of a hard disk reliability test system in accordance with the present invention;
FIG. 6 illustrates a schematic diagram of an embodiment of a computer readable storage medium implementing a hard disk reliability test method in accordance with the present invention;
Fig. 7 is a schematic diagram showing a hardware configuration of an embodiment of a computer device implementing a hard disk reliability test method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.
In order to solve the problems in the prior art and avoid the defects of the prior art, the invention provides a method for testing the reliability of a mechanical hard disk by expanding an environmental risk point under Linux, and solves the defects that the prior art is uncontrollable, and the problem cannot be exposed in advance due to the too conventional testing method. In short, the concept of the invention is mainly based on automatic identification application environment, aims at continuously accessing hard disk smart information under different environments, continuously influences hard disk read-write, automatically identifies a hard disk interface, pertinently opens a hard disk cache, enlarges the risk of abnormal power failure data loss of the hard disk, regulates and controls the rotating speed of a fan, enhances the influence of RV, automatically carries out random write test on the hard disk, carries out abnormal power failure, overlaps the load of a random write-in forcing unit with written WCE data, and effectively avoids the situation that the conventional test method test does not cause problems.
Thus, for ease of programming and execution, the following three modules may be constructed according to functionality to be invoked as needed during implementation of the method, namely module one: automatically identifying an environment, and accessing and grabbing smart information of a hard disk; and a second module: automatically identifying a hard disk interface manufacturer and opening a hard disk cache; and a third module: and combining the two modules, and performing random write test on the hard disk. The details of the above modules one, two and three are specifically set forth below.
To this end, in a first aspect of the present invention, a method 100 for testing the reliability of a hard disk is provided. Fig. 3 shows a schematic block diagram of an embodiment of a hard disk reliability test method according to the present invention. In the embodiment shown in fig. 1, the method comprises:
Step S110: installing at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein the smartctl tools are utilized to confirm the current test environment and acquire smart information of all hard disks;
Step S120: confirming that the hard disk Cache is opened by utilizing the sdparm tool or identifying the type of the hard disk and/or a hard disk manufacturer by utilizing the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the tested queue depth and the strip depth;
Step S130: performing a continuous loop test based on random write processing to a hard disk by using the smartctl tool to access smart information and/or using the ipmitool to set a fan according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition;
step S140: powering off the machine after the continuous loop test is executed for a predetermined execution time and executing the continuous loop test for the predetermined execution time again after a predetermined pause time elapses;
step S150: and monitoring whether the hard disk is locked or not in all test periods so as to test the reliability of the hard disk.
In general, in order to solve the above problems existing in the prior art, the reliability of a mechanical hard disk is tested by expanding an environmental risk point under Linux according to the present invention. In order to expand the environment risk points and automatically execute the corresponding test procedure, the test environment needs to be configured first, so in step S110, at least smartctl tools, sdparm tools, fio components and ipmitool are installed based on Linux, wherein the smartctl tools are used to confirm the current test environment and obtain smart information of all hard disks. Here, step S110 preferably mainly involves executing the aforementioned module three and its call to module one.
Then, after the test environment is configured, the test conditions need to be set. Therefore, in step S120, the sdparm tool is used to confirm that the hard disk Cache is opened or the sdparm tool is used to identify the hard disk type and/or the hard disk manufacturer to use the corresponding tool to open the hard disk Cache, so as to set the queue depth and the stripe depth of the test. Here, preferably, step S120 mainly involves executing the aforementioned module three and its call to the module two.
On this basis, according to the queue depth and stripe depth set in step S120, a continuous loop test based on random write processing to the hard disk is performed in step S130 by accessing the smart information using the smartctl tool and/or setting the fan using the ipmitool, wherein the continuous loop test satisfies a given pressure test condition. Here, preferably, step S130 includes executing the aforementioned module three and its call to module one.
In order to obtain a sufficiently effective test, a large number of repetitions of the test and adjustments according to the requirements are generally required. Thus, in step S140, the continuous loop test is performed for a predetermined execution time, for example, 12 hours (h), after which the machine is powered off and is performed again after a predetermined pause time, for example, 30 seconds (S), has elapsed, for example, 12 hours (h).
Finally, in step S150, whether the hard disk is locked is monitored during all the test periods to test the reliability of the hard disk. Specifically, in the process of the test performed in the foregoing steps S130 to S140, the situation that the invalid address information is written into the NVC and the hard disk is locked after the hard disk is restarted after the NVC content is read may occur, and the smart information access and the fan rotation speed may continuously affect the performance of the hard disk, so that the test is performed in a bad environment, and more problems may be exposed. Therefore, whether the hard disk is locked or not is monitored in all the test periods, and whether the hard disk is locked or not can be judged, so that the reliability of the hard disk can be further evaluated.
The method according to the invention is further explained below in connection with specific examples. It should be noted that the following specific examples are intended to illustrate the implementation of the method according to the invention and should not be construed as limiting the method according to the invention.
According to the method, the influence of the risk points of the testing environment on the read-write of the mechanical hard disk is enlarged under Linux, the application environment is required to be automatically identified, the hard disk smart information is continuously accessed under different environments, the hard disk interface is automatically identified to be opened in a targeted mode, the rotating speed of the fan is regulated and controlled, and the random write test is automatically carried out on the hard disk according to the strip depth and the queue depth.
First, the function of the first module is to automatically identify the environment and access and grab the smart information of the hard disk. Fig. 2 shows a schematic flow chart of an embodiment of hard disk smart information grabbing, i.e. the above module one, of a hard disk reliability test method according to the present invention. The first module includes the following aspects.
And 1, installing a Linux system and a smartctl tool.
Module one 2. Using the smart-a/dev/sd disc symbols to see if all hard discs can get smart information normally.
And 3, acquiring smart information as a direct connection environment, and capturing by using the command of 2, wherein the module exits.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S110: installing at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein validating the current test environment and obtaining smart information of all hard disks using the smartctl tools further comprises:
detecting whether the smart information can be checked by using the smartctl tool to judge whether the hard disk environment is a direct-connection environment or not;
And in response to the normal viewing of the smart information, judging the hard disk environment as a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
Module one 4. If the smart information cannot be obtained, use lspci |grep-E "LSI|AVAGO" to determine if it is in LSI RAID mode ($ hasLsi), LSI environment requires MegaCli tools to be installed.
Module one 5 is obtained using MegaCli64-pdlist-aall |awk-F ":"'/DeviceId/{ printf ("% s,", $2) }/PD Type/{ print $2} "
megaraid_device($mega_device)。
Module one 6.Cat$mega_device|sed-ne "$j '" p' gets the line number $line.
Module one 7.Echo $ line|awk-F ","' { print $1}, obtaining $ device_id from $ line.
And a first module 8, integrating the grabbing information, and obtaining hard disk smart information by using smartctl-a-D MEGARAID, $device_id command.
Module one 9 uses lspci |grep-E "Adaptec" to determine if it is a PMC ("$ hasAdapt), installs RAID card instrumentation arcconf.
Module one 10. Using "$ hasAdapt |wc-l, the PMC raid card number $raid_num is derived.
Module one 11. Use the number of raid cards $ raid_num to get the disk drives under different raid
$diskname,./arcconf getconfig$raid_num|grep-i"disk name"|head-n 1|awk-F:'{print$2}'。
Module one 12. Obtain the slot number of the hard disk by using the number of the raid cards $ raid_num
$slot,./arcconf getconfig$raid_num|grep-i report|grep-i slot|awk-F'('
'{print$1}'|awk'{print$NF}'。
Module one 13 uses the RAID card $ diskname drive to get the carried sg number $sg_num, lsscsi-g|grep $ diskname |awk '{ print $NF }'.
Module one 14. Integrate the above described grab information, use smartctl-d cciss, $slot-a$g num command to get the hard disk smart information.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S110: installing at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein validating the current test environment and obtaining smart information of all hard disks using the smartctl tools further comprises:
In response to being unable to view smart information by using the direct command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to determining that the hard disk environment is LSIraid environments, installing the RAID card tool MegaCli64 and obtaining smart information of all hard disks using LSI commands;
In response to determining that the hard disk environment is PMCraid environments, RAID card utility arcconf is installed and the smart information for all hard disks is obtained using the PMC commands.
Further, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S120: confirming that the hard disk Cache is opened by using the sdparm tool or identifying the hard disk type and/or the hard disk manufacturer by using the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the queue depth and the stripe depth of the test further comprises:
setting a hard disk to JBOD mode in response to the hard disk environment being a non-direct connection environment;
Confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to a JBOD mode;
responding to the confirmation that the hard disk Cache is not opened, acquiring the type of the hard disk interface according to the smart information, and judging the type of the hard disk based on the type of the interface;
responding to the judgment that the hard disk is SAS, opening the hard disk Cache by utilizing the sdparm tool and restarting to confirm that the hard disk Cache is opened;
Responding to the judgment that the hard disk is SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information, opening a hard disk Cache and restarting to confirm that the hard disk Cache is opened;
And in response to confirming that the hard disk Cache is opened, setting the disk array card into a RAID mode.
For this reason, the present embodiment mainly relates to the function of the second module and the call of the third module.
Specifically, the function of the second module is to automatically identify the manufacturer of the hard disk interface and open the hard disk cache. Fig. 3 shows a schematic flow chart of an embodiment of opening a hard disk Cache, i.e. the above-mentioned module two, according to the hard disk reliability test method of the present invention. The second module includes the following aspects.
And a second module 1 is installed under the system sdparm.
And a second module 2, according to the first module, confirming the current testing environment. The non-direct connection environment requires that the RAID card be set to JBOD mode.
And a second module 3, checking whether the WCE of the hard disk is equal to 1 or not by using sdparm tool sdparm-S disk character, wherein the hard disk cache is opened when the WCE of the hard disk is equal to 1, and the module exits.
Module two 4. Hard disk wce=0, grab awk-F "in Smart information in module one,"/Transport protocol:/{ gsub (//, ", $2); the print $2} field obtains the hard disk interface $hd_type to determine if it is an SAS interface, the cache is opened using sdparm tool-S wce=1-S disk drive, the system is restarted, and it is again confirmed that the cache is opened using sdparm-S disk drive. And recovering the RAID mode by the RAID card, wherein the direct connection mode is not required to be modified, and the module exits.
And 5, according to the step 4, the hard disk is a SATA interface. The Device Model in the smart information is required to be grabbed to acquire the hard disk manufacturer, if the disk is Hitachi HGST or WDC, a sg_raw tool is required to be installed, and the cache is opened by using sg_raw-s 512-i WCE.bin disk symbols; if the disk is grasped as a Seagate Seagate, installing CHANGEWCSTATE tool, and opening the cache by using the +/changeWCstate. If the method is Toshiba, using smartctl-swcache-sct, on, p-disk symbol to open the cache; the system is restarted and again confirms that the look-aside cache was opened using sdparm-S disk drives. And recovering the RAID mode by the RAID card, wherein the direct connection mode is not required to be modified, and the module exits.
The function of the third module is to combine the first module and the second module to perform random write test on the hard disk. Fig. 4 shows a schematic flow chart of an embodiment of a random write test, i.e. the above-described module three, on a hard disk according to the hard disk reliability test method of the present invention. The third module includes the following aspects.
And thirdly, installing fio and the dependent package, ipmitool of the fio.
And thirdly, opening the hard disk cache according to the second module before testing.
And thirdly, setting the script queue depth to be TEST_ QDEPTH = "8,16", setting the stripe depth to be 512 b-2 MB, and taking a plurality of incremental values for each time of cross TEST for 15min.
In addition, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S120: confirming that the hard disk Cache is opened by using the sdparm tool or identifying the hard disk type and/or the hard disk manufacturer by using the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the queue depth and the stripe depth of the test further comprises:
The queue depth is set to a predetermined number of values and the stripe depth is set to a plurality of increasing values within a predetermined interval, wherein each value is cross tested for a predetermined time interval.
And thirdly, continuously accessing smart information to the hard disk according to the first module in the test period.
Module three 5. The fan speed is set to be automatically regulated using ipmitool every 30 minutes during the test, with a preset number of incremental values, preferably 30%, 85%, 100%.
To this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S130: performing a continuous loop test based on random write processing to a hard disk by accessing smart information using the smartctl tool and/or setting a fan using the ipmitool according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition further comprises: and regulating and controlling the rotating speed of the fan by using the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
And a third module 6, wherein the following three points are achieved in the test process:
a. random write processing in WC mode is buffered to mDRAM (Random WRITES IN WCE mode THAT ARE CACHED into mDRAM)
B. random write FUA workload superimposed WCE data (Random write FUA workload overlapping WCE data from previous)
C. no refresh action is performed;
to this end, in some embodiments of the hard disk reliability test method 100 according to the present invention, step S130: performing a continuous loop test based on random write processing to a hard disk by accessing smart information using the smartctl tool and/or setting a fan using the ipmitool according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition further comprises:
the given pressure test conditions include test satisfaction:
Random write processing in WC mode is cached in mDRAM;
Randomly writing WCE data before FUA workload superposition; and
Any refresh action is disabled.
Module three 7. Continue testing for a predetermined execution time, e.g., 12 hours, to power down.
Module three 8, after a predetermined pause time, e.g. 30s, the machine is awakened and the script automatically performs the test for a predetermined execution time, e.g. 12h.
In this case, invalid address information may be written into the NVC, after the hard disk is restarted, the hard disk is locked after the NVC content is read, and the smart information access and the fan rotation speed may continuously affect the performance of the hard disk, so that more problems may be exposed when the test is performed in a bad environment.
According to the embodiment of the invention, the influence of the risk points of the testing environment on the read-write of the mechanical hard disk is enlarged under Linux, the application environment is automatically identified, the hard disk cache is automatically identified to be opened in a targeted manner aiming at continuous access to the smart information of the hard disk under different environments, the rotating speed of the fan is regulated and controlled, and the random write test is automatically carried out on the hard disk according to the strip depth and the queue depth. Therefore, according to the invention, an unconventional testing method is adopted, so that the situation that the hard disk supplier does not have problems after optimizing the OD/ID of the hard disk according to the conventional testing mode is avoided; the influence of LPC command access hard disk smart on hard disk read-write is enlarged, so that the influence is continuous; the influence of the rotating speed of the fan on the reading and writing of the hard disk is enlarged; the hard disk cache is opened, and the power is cut off when the hard disk is written, so that the hard disk read-write speed is improved, the abnormal power-down data loss risk of the hard disk is increased, the influence of the testing environment is continuously amplified, and the problems are found and solved early.
In a second aspect of the present invention, there is also provided a hard disk reliability test system 200. Fig. 5 shows a schematic block diagram of an embodiment of a hard disk reliability test system 200 according to the present invention. As shown in fig. 5, the system includes:
an environment configuration module 210, wherein the environment configuration module 210 is configured to install at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, and confirm the current test environment and acquire smart information of all hard disks by using the smartctl tools;
A condition setting module 220, wherein the condition setting module 220 is configured to confirm that the hard disk Cache is opened by using the sdparm tool or identify the hard disk type and/or the hard disk manufacturer by using the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the queue depth and the stripe depth of the test;
A persistence test module 230, the persistence test module 230 configured to perform a persistence loop test based on random write processing to a hard disk by accessing smart information with the smartctl tool and/or setting a fan with the ipmitool according to the set queue depth and stripe depth, wherein the persistence loop test satisfies a given pressure test condition;
a loop execution module 240, the loop execution module 240 configured to power down a machine after the continuous loop test has been executed for a predetermined execution time and to re-execute the continuous loop test for the predetermined execution time after a predetermined pause time has elapsed;
The result monitoring module 250, the result monitoring module 250 is configured to monitor whether the hard disk has a locking condition during all test periods to test the reliability of the hard disk.
In a third aspect of the embodiment of the present invention, a computer readable storage medium is provided, and fig. 6 is a schematic diagram of a computer readable storage medium of a hard disk reliability testing method according to an embodiment of the present invention. As shown in fig. 6, the computer-readable storage medium 300 stores computer program instructions 310, the computer program instructions 310 being executable by a processor. The computer program instructions 310, when executed, implement the method of any of the embodiments described above.
It should be understood that all of the embodiments, features and advantages set forth above for the hard disk reliability test method according to the present invention equally apply to the hard disk reliability test system and storage medium according to the present invention without conflicting with each other.
In a fourth aspect of the embodiments of the present invention, there is also provided a computer device 400 comprising a memory 420 and a processor 410, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of the embodiments described above.
Fig. 7 is a schematic hardware structure diagram of an embodiment of a computer device for performing the hard disk reliability test method according to the present invention. Taking the example of a computer device 400 as shown in fig. 7, a processor 410 and a memory 420 are included in the computer device, and may further include: an input device 430 and an output device 440. The processor 410, memory 420, input device 430, and output device 440 may be connected by a bus or other means, for example in fig. 7. The input device 430 may receive input numeric or character information and generate signal inputs related to hard disk reliability testing. The output 440 may include a display device such as a display screen.
The memory 420 is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the resource monitoring method in the embodiment of the present application. Memory 420 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of the resource monitoring method, and the like. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 410 executes various functional applications of the server and data processing, i.e., implements the methods of the method embodiments described above, by running non-volatile software programs, instructions, and modules stored in the memory 420.
Finally, it should be noted that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP and/or any other such configuration.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. The hard disk reliability test method is characterized by comprising the following steps of:
Installing at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein the smartctl tools are utilized to confirm the current test environment and acquire smart information of all hard disks;
Confirming that the hard disk Cache is opened by utilizing the sdparm tool or identifying the type of the hard disk and/or a hard disk manufacturer by utilizing the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the tested queue depth and the strip depth;
performing a continuous loop test based on random write processing to a hard disk by using the smartctl tool to access smart information and/or using the ipmitool to set a fan according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition;
Powering off the machine after the continuous loop test is executed for a predetermined execution time and executing the continuous loop test for the predetermined execution time again after a predetermined pause time elapses;
And monitoring whether the hard disk is locked or not in all test periods so as to test the reliability of the hard disk.
2. The method of claim 1, wherein the performing a continuous loop test based on random write processing to a hard disk by accessing smart information with the smartctl tool and/or setting a fan with the ipmitool according to the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition further comprises:
the given pressure test conditions include test satisfaction:
Random write processing in WC mode is cached in mDRAM;
Randomly writing WCE data before FUA workload superposition; and
Any refresh action is disabled.
3. The method of claim 1, wherein the determining, with the sdparm tool, that the hard disk Cache is opened or identifying, with the sdparm tool, a hard disk type and/or a hard disk vendor to open the hard disk Cache with a corresponding tool, thereby setting a queue depth and a stripe depth of the test further comprises:
The queue depth is set to a predetermined number of values and the stripe depth is set to a plurality of increasing values within a predetermined interval, wherein each value is cross tested for a predetermined time interval.
4. A method according to any one of claims 1 to 3, wherein the Linux-based installation of at least smartctl tools, sdparm tools and fio components and ipmitool, wherein validating the current test environment and obtaining smart information for all hard disks using the smartctl tools further comprises:
detecting whether the smart information can be checked by using the smartctl tool to judge whether the hard disk environment is a direct-connection environment or not;
And in response to the normal viewing of the smart information, judging the hard disk environment as a direct connection environment and capturing the smart information of all the hard disks through the direct connection command.
5. The method of claim 4, wherein the Linux-based installation of at least smartctl tools, sdparm tools, and fio components and ipmitool, wherein validating a current test environment and obtaining smart information for all hard disks using the smartctl tools further comprises:
In response to being unable to view smart information by using the direct command, further checking whether the hard disk environment is LSIraid or PMCraid;
in response to determining that the hard disk environment is LSIraid environments, installing the RAID card tool MegaCli64 and obtaining smart information of all hard disks using LSI commands;
In response to determining that the hard disk environment is PMCraid environments, RAID card utility arcconf is installed and the smart information for all hard disks is obtained using the PMC commands.
6. The method of claim 5, wherein the determining, with the sdparm tool, that the hard disk Cache is opened or identifying, with sdparm tool, a hard disk type and/or a hard disk vendor to open the hard disk Cache with a corresponding tool, thereby setting a queue depth and a stripe depth of a test further comprises:
setting a hard disk to JBOD mode in response to the hard disk environment being a non-direct connection environment;
Confirming whether a hard disk Cache is opened or not for a hard disk in a direct connection environment or a hard disk in a non-direct connection environment which is set to a JBOD mode;
responding to the confirmation that the hard disk Cache is not opened, acquiring the type of the hard disk interface according to the smart information, and judging the type of the hard disk based on the type of the interface;
responding to the judgment that the hard disk is SAS, opening the hard disk Cache by utilizing the sdparm tool and restarting to confirm that the hard disk Cache is opened;
Responding to the judgment that the hard disk is SATA, further acquiring hard disk manufacturer information according to the smart information, installing a corresponding tool according to the hard disk manufacturer information, opening a hard disk Cache and restarting to confirm that the hard disk Cache is opened;
And in response to confirming that the hard disk Cache is opened, setting the disk array card into a RAID mode.
7. A method according to any one of claims 1 to 3, wherein said performing a continuous loop test based on random write processing to a hard disk by accessing smart information with said smartctl tool and/or setting a fan with said ipmitool according to a set queue depth and stripe depth, wherein said continuous loop test satisfying a given pressure test condition further comprises:
And regulating and controlling the rotating speed of the fan by using the ipmitool according to a plurality of preset incremental values every preset time interval during the continuous cycle test.
8. A hard disk reliability test system, comprising:
The environment configuration module is configured to install at least smartctl tools, sdparm tools, fio components and ipmitool based on Linux, wherein the smartctl tools are utilized to confirm the current test environment and acquire smart information of all hard disks;
The condition setting module is configured to confirm that the hard disk Cache is opened by utilizing the sdparm tool or identify the type of the hard disk and/or a hard disk manufacturer by utilizing the sdparm tool so as to use the corresponding tool to open the hard disk Cache, thereby setting the queue depth and the stripe depth of the test;
A continuous test module configured to perform a continuous loop test based on random write processing to a hard disk by accessing smart information with the smartctl tool and/or setting a fan with the ipmitool in accordance with the set queue depth and stripe depth, wherein the continuous loop test satisfies a given pressure test condition;
A cycle execution module configured to power down a machine after the continuous cycle test executes a predetermined execution time and to execute the continuous cycle test for the predetermined execution time again after a predetermined pause time has elapsed;
And the result monitoring module is configured to monitor whether the hard disk is in a locking condition or not in all test periods so as to test the reliability of the hard disk.
9. A computer readable storage medium, characterized in that computer program instructions are stored, which when executed implement the hard disk reliability test method according to any of claims 1-7.
10. A computer device comprising a memory and a processor, wherein the memory has stored therein a computer program which, when executed by the processor, performs the hard disk reliability test method of any one of claims 1-7.
CN202210747664.9A 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment Active CN115114053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210747664.9A CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210747664.9A CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN115114053A CN115114053A (en) 2022-09-27
CN115114053B true CN115114053B (en) 2024-05-14

Family

ID=83330224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210747664.9A Active CN115114053B (en) 2022-06-29 2022-06-29 Hard disk reliability test method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN115114053B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899119A (en) * 2015-05-21 2015-09-09 浪潮电子信息产业股份有限公司 Method for automatically detecting hard disk abnormity
CN110413463A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of SMART information inspection method of hard disk
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899119A (en) * 2015-05-21 2015-09-09 浪潮电子信息产业股份有限公司 Method for automatically detecting hard disk abnormity
CN110413463A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of SMART information inspection method of hard disk
CN110992992A (en) * 2019-10-31 2020-04-10 苏州浪潮智能科技有限公司 Hard disk test method, device and storage medium

Also Published As

Publication number Publication date
CN115114053A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
US8332695B2 (en) Data storage device tester
CN102568522B (en) The method of testing of hard disk performance and device
US20070168734A1 (en) Apparatus, system, and method for persistent testing with progressive environment sterilzation
US20190042391A1 (en) Techniques for monitoring errors and system performance using debug trace information
US20110154113A1 (en) Data storage device tester
CN111105840B (en) Method, device and system for testing abnormal power failure of solid state disk
US20060259814A1 (en) Method and system for optimizing testing of memory stores
CN110457907B (en) Firmware program detection method and device
CN111078515B (en) SSD layered log recording method, SSD layered log recording device, SSD layered log recording computer device and storage medium
US11158394B2 (en) Performance evaluation of solid state memory device
CN111984487A (en) Method and device for recording fault hardware position off-line
JP7043598B2 (en) Hard disk drive life prediction
CN115098291A (en) Method, system, storage medium and equipment for recording system restart reason
CN115525486A (en) SSD SMBUS temperature alarm and low power consumption state test verification method and device
CN115114053B (en) Hard disk reliability test method, system, storage medium and equipment
US7606948B2 (en) System and method for generating warranty and pricing information for data storage apparatus
CN114049909A (en) Automatic hard disk performance testing method and device, electronic equipment and storage medium
CN109741786A (en) A kind of solid state hard disk monitoring method, device and equipment
CN114528181A (en) Storage protection prompt processing method and device based on solid state disk, and terminal
US7571263B2 (en) Apparatus and method for monitoring data storage device for usage and warranty
CN117707884A (en) Method, system, equipment and medium for monitoring power management chip
CN105630523A (en) Computer BIOS data recovery system and method
TW201301023A (en) System and method for testing a mother board
CN114968694A (en) Hard disk fault injection method, device, equipment and storage medium
US20190310800A1 (en) Method for accessing code sram and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant