CN112416639B

CN112416639B - Slow disk detection method, device, equipment and storage medium

Info

Publication number: CN112416639B
Application number: CN202011278532.3A
Authority: CN
Inventors: 周永洪
Original assignee: New H3C Technologies Co Ltd Chengdu Branch
Current assignee: New H3C Technologies Co Ltd Chengdu Branch
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2022-08-23
Anticipated expiration: 2040-11-16
Also published as: CN112416639A

Abstract

The application provides a slow disc detection method, a slow disc detection device, equipment and a storage medium, wherein the method comprises the following steps: acquiring read-write time delay of each read-write request on a target hard disk within a preset time period; determining that the target hard disk is a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition; the first condition includes: the proportion of the number of the slow requests in the preset time period to the total number of all the read-write requests in the preset time period is more than or equal to a first preset proportion; the second condition includes: and the average service time in the preset time period is greater than or equal to a preset service time threshold. Therefore, the hard disk with poor read-write performance in the storage system can be found in time through the scheme, and the hard disk cluster can be isolated, so that the problem that the performance of the system is reduced due to the existence of the slow disk is solved.

Description

Slow disk detection method, device, equipment and storage medium

Technical Field

The present application relates to the field of storage technologies, and in particular, to a slow disc detection method, apparatus, device, and storage medium.

Background

In the use process of the hard disk, due to magnetic degradation, track damage or vibration of the hard disk, the time for the hard disk to respond to an input/output (I/O) request (or read/write request) is increased relative to the rated I/O request response time, and such a hard disk with an increased I/O request response time is called a slow disk.

In a distributed storage system, in order to ensure data security, a technology similar to a copy or an erasure code is necessarily adopted to back up data, and a minimum disaster tolerance domain is at least at a host level. The scheme is excellent in terms of data security, but the physical hard disk serving as a bottom storage medium of the storage system is likely to fail, and a slow disk is normally generated in a large-scale cluster which runs under high load for a long time. Due to the nature of distributed storage, the presence of a slow disk will affect the performance of the overall storage system. Therefore, it is necessary to discover the slow disk in the system and isolate the hard disk cluster in time to exert stable performance on the distributed storage system.

At present, for the detection of a hard disk, a bad disk and a bad track, the detection mode is to return a corresponding error code to an application layer through a bottom layer drive, a storage cluster can uniquely confirm the fault of the hard disk through the error code, and different evading or repairing measures are taken for different faults. The hard disk fault detection judges the fault type through the error code returned by the bottom layer, is only suitable for clearly defined faults, and cannot be used for the hard disk fault which is not clearly defined and is a slow disk.

Disclosure of Invention

The application aims to provide a slow disk detection method, a slow disk detection device and a slow disk detection storage medium, and a hard disk with poor read-write performance in a storage system is discovered in time and isolated from a hard disk cluster, so that the problem of system performance reduction caused by the existence of a slow disk is solved.

A first aspect of the present application provides a slow disk detection method, which is applied to a distributed storage device, and includes:

acquiring read-write time delay of each read-write request on a target hard disk within a preset time period; the read-write time delay is the time from the beginning when the hard disk receives the read-write request to the end when the hard disk responds to the read-write request and returns the read-write result;

determining that the target hard disk is a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition;

the first condition includes: if the read-write time delay of the read-write request exceeds a first time delay threshold, determining the read-write request to be a slow request; if the ratio of the number of the slow requests in the preset time period to the total number of all the read-write requests in the preset time period is greater than or equal to a first preset ratio, determining that the target hard disk is a slow disk;

the second condition includes: in the preset time period, the time from the starting time of the first read-write request to the ending time of the last read-write request is the total service time, and the ratio of the total service time to the total number of all the read-write requests in the preset time period is the average service time; and if the average service time is greater than or equal to a preset service time threshold, determining that the target hard disk is a slow disk.

A second aspect of the present application provides a slow disc detection apparatus, which is applied to a distributed storage device, and includes:

the acquisition module is used for acquiring the read-write time delay of each read-write request on the target hard disk within a preset time period; the read-write time delay is the time from the beginning of the hard disk when receiving the read-write request to the end of the hard disk when responding to the read-write request and returning the read-write result;

the determining module is used for determining the target hard disk as a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition;

A third aspect of the present application provides a slow disc detection apparatus, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing when executing the computer program to implement the method as described in the first aspect.

A fourth aspect of the present application provides a computer readable storage medium having computer readable instructions stored thereon which are executable by a processor to implement the method as described in the first aspect.

Compared with the prior art, the slow disk detection method, the slow disk detection device, the slow disk detection equipment and the slow disk detection storage medium have the advantages that the read-write time delay of each read-write request on the target hard disk is obtained within the preset time period; determining that the target hard disk is a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition; the first condition includes: the proportion of the number of the slow requests in the preset time period to the total number of all the read-write requests in the preset time period is more than or equal to a first preset proportion; the second condition includes: and the average service time in the preset time period is greater than or equal to a preset service time threshold. Therefore, the hard disk with poor read-write performance in the storage system can be found in time through the scheme, and the hard disk cluster can be isolated, so that the problem that the performance of the system is reduced due to the existence of the slow disk is solved.

Drawings

Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow chart of a slow disc detection method provided by some embodiments of the present application;

fig. 2 is a schematic diagram illustrating a corresponding relationship between a read-write delay of a read-write request and a preset period in the present application;

FIG. 3 is a diagram illustrating a correspondence relationship between read-write latency of a read-write request and total service time in the present application;

FIG. 4 illustrates a schematic diagram of a slow disc detection apparatus provided by some embodiments of the present application;

fig. 5 shows a schematic diagram of a slow disc detection apparatus provided by some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

In a distributed storage system, a physical hard disk is used as a bottom storage medium of the storage system, and is a phenomenon that a fault occurs with probability and a slow disk occurs in a large-scale cluster running for a long time under high load. Due to the nature of distributed storage, if a block of slow disks exists, the performance of the entire storage system will be affected.

In view of the above, embodiments of the present application provide a slow disc detection method and apparatus, a slow disc detection device, and a computer readable storage medium, which are described below with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a slow disk detection method provided in some embodiments of the present application is shown, where the method is applied in a distributed storage device (or system) for timely detecting a slow disk in the distributed storage system, and as shown in the figure, the method may specifically include the following steps:

step S101: acquiring read-write time delay of each read-write request on a target hard disk within a preset time period; the read-write time delay is the time from the time when the hard disk receives the read-write request to the time when the hard disk responds to the read-write request and returns the read-write result.

For example, a distributed storage system (hereinafter referred to as a system) includes a plurality of distributed storage nodes, where a storage node may be understood as a hard disk for storing data, and a plurality of hard disks form a hard disk cluster of the system. The read-write delay is a time taken for the read-write request to return a read-write result responding to the read-write request from the hard disk to the hard disk, that is, the system issues the read-write request (i.e., I/O request) to the hard disks in the hard disk cluster, the hard disk returns a corresponding read-write result (i.e., I/O result) after processing the read-write request, and when the I/O request issues and returns the I/O result, the delay of each I/O may be obtained by means of stamping a timestamp, such as delay0, delay1, delay2, and delay3 shown in fig. 2.

In this step, the read-write time delay of the read-write request on each hard disk is recorded in real time, the preset time period is a time period for slow disk detection, and the read-write time delay of the read-write request on each hard disk recorded in the time period is obtained.

Step S102: and determining that the target hard disk is a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition.

Wherein the first condition comprises: if the read-write time delay of the read-write request exceeds a first time delay threshold, determining the read-write request to be a slow request; if the ratio of the number of the slow requests in the preset time period to the total number of all the read-write requests in the preset time period is greater than or equal to a first preset ratio, determining that the target hard disk is a slow disk;

According to some embodiments of the present application, the step S102 may be implemented as:

step S201: dividing the preset time period into a plurality of preset periods, and determining the read-write time delay of the read-write request corresponding to each preset period;

step S202: and determining the target hard disk as a slow disk according to the fact that the read-write time delay of the read-write request corresponding to each preset period meets a first condition and/or a second condition.

In the step S201, a corresponding relationship between the read-write time delay of each read-write request and each preset period is determined; each preset period is obtained by dividing a preset time period into a plurality of continuous preset periods.

For example, if the preset time period is 10 minutes, the 10 minutes can be divided into 101 minutes, where 1 minute is a preset period, such as the period T in fig. 2. And then determining the corresponding relationship between the read-write time delay of each read-write request and each preset period, namely determining which period T the read-write time delays such as delay0, delay1, delay2 and delay3 belong to. For the read-write time delay of the cross cycle, the read-write time delay can be abandoned and does not belong to any cycle T.

In the step S202, it is determined that the target hard disk is a slow disk according to that the read-write latency of the read-write request corresponding to each preset period satisfies a preset first condition and/or a preset second condition.

In this step, the first condition specifically includes:

if the read-write time delay of the read-write request exceeds a first time delay threshold, determining the read-write request as a slow request;

if the proportion of the number of the slow requests in the preset period to the total number of all the read-write requests in the preset period is larger than or equal to a first preset proportion, determining that the preset period is the slow period;

and if the proportion of the number of the slow cycles in the preset time period to the total number of all the preset cycles in the preset time period is greater than or equal to a second preset proportion, determining that the target hard disk is a slow disk.

For example, in a preset period T, an I/O request whose read/write latency exceeds a first latency threshold T is called a slow I/O, and if the proportion of the slow I/O in a period to the total number of all I/O requests in the period (the total number is unknown because the number of I/O requests is not used) is greater than or equal to a first preset proportion r, the period is considered as a slow period; if at least m of the n consecutive periods are slow periods, the hard disk is determined to be a slow disk, and the ratio of m to n is a second preset ratio.

In this step, the second condition specifically includes:

in a preset period, the time from the starting time of a first read-write request to the ending time of a last read-write request is the total service time, and the ratio of the total service time to the total number of all the read-write requests in the preset period is the average service time;

if the average service time is greater than or equal to a preset service time threshold, determining that the preset period is a slow period;

The average service time refers to a delay of a single I/O in a period of time when the hard disk is busy, see fig. 3, where the average service time svctm is t/3. In the figure, t1, t2, and t3 are read/write delays of three read/write requests in a preset period, and t is a total service time.

For example, in a preset period T, a parameter of an average service time svctm is calculated, and if svctm exceeds a preset service time threshold s, the period is considered as a slow period; if at least m of the n consecutive cycles are slow cycles, the hard disk is determined to be a slow disk.

Therefore, in step S202, when the read-write latency of the read-write request corresponding to each preset period of the target hard disk meets the first condition or the second condition, it may be determined that the target hard disk is a slow disk.

However, the first condition is to detect directly from the I/O latency dimension, which appears to detect slow disks directly from their characteristics, but ignores an important factor, that is, I/O request concurrency. For example, if the I/O latency exceeds 100ms, it is considered to be slow I/O, and in case of concurrent 1I/O requests, 100I/Os will take 10 s; if the I/O concurrency is adjusted to 10, only 1s is needed after 100I/Os are processed, the target hard disk is not marked as a slow disk, and the detection dimension of the average service time is to solve the slow disk detection problem in the scene.

Referring to fig. 3: average read-write time delay

It can be seen that average service times are achieved with I/O concurrency greater than 1The svctm is less than the average read-write time delay

And the higher the concurrency, the greater the difference. Generally, in a high-concurrency scene, a hard disk is busy, the time for processing an I/O request is prolonged, and the introduction of the average service time can solve the problem of false alarm caused by the fact that the hard disk is busy due to high service pressure when a slow disk is detected by only I/O time delay.

Therefore, in other embodiments of the present application, when the first condition and the second condition are simultaneously satisfied, it may be determined that the target hard disk is a slow disk, and the slow disk determination of the embodiment is more accurate.

According to some embodiments of the present application, after the step S102 determines that the target hard disk is a slow disk, the method further includes: and isolating the target hard disk from a hard disk cluster of the distributed storage system.

According to the slow disk detection method provided by the embodiment of the application, the read-write time delay of each read-write request on the target hard disk is obtained within a preset time period; determining that the target hard disk is a slow disk according to the fact that the read-write time delay of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition; the first condition includes: the proportion of the number of the slow requests in the preset time period to the total number of all the read-write requests in the preset time period is more than or equal to a first preset proportion; the second condition includes: and the average service time in the preset time period is greater than or equal to a preset service time threshold. Therefore, the hard disk with poor read-write performance in the storage system can be found in time through the scheme, and the hard disk cluster can be isolated, so that the problem that the performance of the system is reduced due to the existence of the slow disk is solved.

In the foregoing embodiment, a slow disc detection method is provided, and correspondingly, the present application also provides a slow disc detection apparatus. The slow disc detection device provided by the embodiment of the application can implement the slow disc detection method. Please refer to fig. 4, which illustrates a schematic diagram of a slow disc detection apparatus provided in some embodiments of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

As shown in fig. 4, the slow disc detection apparatus 10, applied to a distributed storage device, may include:

an obtaining module 101, configured to obtain, within a preset time period, read-write delays of read-write requests on a target hard disk; the read-write time delay is the time from the beginning when the hard disk receives the read-write request to the end when the hard disk responds to the read-write request and returns the read-write result;

a determining module 102, configured to determine that a target hard disk is a slow disk according to that read-write latency of each read-write request on the target hard disk in the preset time period meets a first condition and/or a second condition;

In some implementations of the embodiments of the present application, the determining module 102 is specifically configured to:

dividing the preset time period into a plurality of preset periods, and determining the read-write time delay of the read-write request corresponding to each preset period;

and determining the target hard disk as a slow disk according to the fact that the read-write time delay of the read-write request corresponding to each preset period meets a first condition and/or a second condition.

In some implementations of embodiments of the present application, the first condition specifically includes:

if the proportion of the number of the slow requests in the preset period to the total number of all the read-write requests in the preset period is larger than or equal to a first preset proportion, determining the preset period as a slow period;

In some implementations of embodiments of the present application, the second condition specifically includes:

in a preset period, the time from the starting moment of the first read-write request to the ending moment of the last read-write request is the total service time, and the ratio of the total service time to the total number of all the read-write requests in the preset period is the average service time;

if the average service time is greater than or equal to the preset service time threshold, determining the preset period as a slow period;

In some implementations of the embodiments of the present application, the target hard disk is any hard disk in a hard disk cluster of a distributed storage system; the apparatus 10 further comprises:

and the isolation module is used for isolating the target hard disk from the hard disk cluster of the distributed storage system after the judgment module determines that the target hard disk is the slow disk.

The slow disc detection device provided by the above embodiment of the present application and the slow disc detection method provided by the embodiment of the present application have the same inventive concept and the same beneficial effects.

The present application further provides a slow disc detection device corresponding to the slow disc detection method provided in the foregoing embodiments, where the device may be an electronic device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the slow disc detection method.

Please refer to fig. 5, which illustrates a schematic diagram of a slow disc detection apparatus provided in some embodiments of the present application. As shown in fig. 5, the slow disc detecting apparatus 20 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to perform the slow disc detection method provided in any of the foregoing embodiments.

The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like may be used.

Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the slow disc detection method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.

The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.

The slow disc detection device provided by the embodiment of the application and the slow disc detection method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the slow disc detection device.

The present application further provides a computer-readable storage medium corresponding to the slow disc detection method provided in the foregoing embodiments, where the computer-readable storage medium may be an optical disc, and a computer program (i.e., a program product) is stored on the optical disc, and when the computer program is executed by a processor, the computer program executes the slow disc detection method provided in any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the slow disc detection method provided by the embodiment of the present application have the same beneficial effects as the method adopted, executed or implemented by the application program stored in the computer-readable storage medium.

Finally, it should be noted that: the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims

1. A slow disk detection method is applied to a distributed storage device, and comprises the following steps:

determining the target hard disk as a slow disk according to the fact that the read-write time delay of the read-write request corresponding to each preset period meets a first condition and/or a second condition;

the first condition includes: if the read-write time delay of the read-write request exceeds a first time delay threshold value, determining the read-write request to be a slow request;

if the proportion of the number of the slow cycles in the preset time period to the total number of all the preset cycles in the preset time period is greater than or equal to a second preset proportion, determining that the target hard disk is a slow disk;

the second condition includes: in the preset time period, the time from the starting time of the first read-write request to the ending time of the last read-write request is the total service time, and the ratio of the total service time to the total number of all the read-write requests in the preset time period is the average service time;

in the preset period, if the average service time is greater than or equal to a preset service time threshold, determining that the preset period is a slow period;

2. A slow disk detection device is applied to distributed storage equipment and comprises:

the acquisition module is used for acquiring the read-write time delay of each read-write request on the target hard disk within a preset time period; the read-write time delay is the time from the beginning when the hard disk receives the read-write request to the end when the hard disk responds to the read-write request and returns the read-write result;

the determining module is used for dividing the preset time period into a plurality of preset cycles and determining the read-write time delay of the read-write request corresponding to each preset cycle; determining the target hard disk as a slow disk according to the fact that the read-write time delay of the read-write request corresponding to each preset period meets a first condition and/or a second condition;

the first condition includes: if the read-write time delay of the read-write request exceeds a first time delay threshold, determining the read-write request to be a slow request;

3. A slow disc detection apparatus, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes when executing the computer program to implement the method as claimed in claim 1.

4. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of claim 1.