WO2017020614A1 - 一种检测磁盘的方法及装置 - Google Patents

一种检测磁盘的方法及装置 Download PDF

Info

Publication number
WO2017020614A1
WO2017020614A1 PCT/CN2016/080376 CN2016080376W WO2017020614A1 WO 2017020614 A1 WO2017020614 A1 WO 2017020614A1 CN 2016080376 W CN2016080376 W CN 2016080376W WO 2017020614 A1 WO2017020614 A1 WO 2017020614A1
Authority
WO
WIPO (PCT)
Prior art keywords
response time
real
data
time
predetermined distance
Prior art date
Application number
PCT/CN2016/080376
Other languages
English (en)
French (fr)
Inventor
李静辉
张金冬
黄澄
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP16832103.2A priority Critical patent/EP3321807B1/en
Publication of WO2017020614A1 publication Critical patent/WO2017020614A1/zh
Priority to US15/883,029 priority patent/US10768826B2/en
Priority to US17/001,594 priority patent/US20200387311A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3485Performance evaluation by tracing or monitoring for I/O devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the present invention relates to the field of electronic technologies, and in particular, to a method and apparatus for detecting a magnetic disk.
  • Disks due to magnetic degradation, bad sectors, vibration or other mechanical and environmental problems can cause the response time of the disk to read and write input and output (English, Input / Output, referred to as: I / O) becomes longer.
  • I / O Input / Output
  • a disk that responds too slowly to an I/O request is called a slow disk.
  • Slow disk is an important threat to the reliability of storage systems.
  • redundant arrays Redundant Arrays of Inexpensive Disks, RAID
  • distributed storage systems the slowdown of one disk may result in system-wide performance degradation and even business interruption in severe cases. Therefore, the disk needs to be tested so that appropriate measures can be taken in time, such as isolating the slow disk and backing up the data.
  • Embodiments of the present invention provide a method and apparatus for detecting a disk to detect whether a slow disk is present.
  • a first aspect of the present invention provides a method of detecting a disk, including:
  • N I/O related indicators include an I/O response time of the disk and affecting the I/O An indicator of response time;
  • the I/O response time is a time from when the application issues an operation request to when the response of the disk is received to the request;
  • N is an integer greater than or equal to 2;
  • the I/O response time abnormality indicates that the disk cannot run a service normally
  • the I/O response time normally indicates the magnetic The disk can run the business normally
  • a detection result is output, and the detection result is used to characterize the I/O response time abnormality.
  • the determining, according to the N real-time data, whether the I/O response time is abnormal includes:
  • each of the remaining N-1 I/O related indicators is I/ Which of the at least two preset intervals of the O-related indicator is, wherein the N-1 real-time data corresponding to the remaining N-1 I/O related indicators are respectively in the N-1 preset intervals
  • the at least two preset intervals of each of the N-1 I/O related indicators are: dividing the first value and the first supported by each of the I/O related indicators a range of at least two subintervals obtained over a large range between binary values;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the determining, according to the N real-time data, whether the I/O response time is abnormal includes:
  • the N real-time data is used as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system respectively corresponds to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the determining, according to the N real-time data, whether the I/O response time is abnormal includes:
  • the N real-time data is used as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system respectively corresponds to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is: making the M/N data points and the I/O The probability that the M data corresponding to the response time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, determining that the I/O response time is abnormal.
  • the determining, according to the N real-time data, whether the I/O response time is abnormal includes:
  • the N real-time data is used as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system respectively corresponds to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the determining, according to the N real-time data, whether the I/O response time is abnormal includes:
  • the N real-time data is used as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system respectively corresponds to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is: making the M/N data points and the I/O The probability that the M data corresponding to the response time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • a second aspect of the present invention provides an apparatus for detecting a magnetic disk, including:
  • a data collection unit configured to collect a set of N real-time data corresponding to N input/output I/O related indicators of the disk; wherein the N I/O related indicators include an I/O response time of the disk and An indicator that affects the I/O response time; the I/O response time is a time from when an application issues an operation request to when a response of the disk to the request is received; N is an integer greater than or equal to 2. ;
  • a processing unit configured to determine, according to the N real-time data, whether the I/O response time is abnormal; the I/O response time abnormality indicates that the disk cannot run a service normally; and the I/O response time is normal.
  • the disk can run the service normally; if the I/O response time is abnormal, the detection result is output, and the detection result is used to represent the I/O response time abnormality.
  • the processing unit is configured to: determine, corresponding to the remaining N-1 I/O related indicators except the I/O response time N-1 real-time data are respectively in which of the at least two preset intervals of each of the remaining N-1 I/O related indicators, wherein the remaining N- N-1 real-time data corresponding to one I/O related indicator are respectively in N-1 preset intervals; said each of the N-1 I/O related indicators is related to the I/O related indicator
  • the at least two preset intervals are: dividing at least two sub-interval ranges obtained by the wide range between the first value and the second value that each of the I/O related indicators can support;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the processing unit is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N-dimensional in the system respectively corresponds to the N I/O related indicators; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance;
  • the class center is: a center point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators collected before the collecting the N real-time data;
  • the predetermined distance is: Determining that the probability that the M distance values of the M N-dimensional data points are greater than the predetermined distance from the cluster center is smaller than the probability range that the user can accept; if the first distance is greater than the predetermined distance, determining the I The /O response time is abnormal.
  • the processing unit For: using the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system is respectively corresponding to the N I/O related indicators; Whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the cluster center is: the N I/Os collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the correlation indicator; the predetermined distance is: causing the M distance values of the M N-dimensional data points from the cluster center to be greater than the predetermined distance The probability of the I/O response time exceeds a preset I/O response time threshold; the I/O response time threshold is: the M N The probability that the M data corresponding to the I/O response time in the dimension data point is greater than the I/O response time threshold is smaller than the probability range that the user can accept; if the first distance is greater than
  • the processing unit is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N dimensions in the system are respectively corresponding to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the processing unit is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N dimension in the coordinate system respectively corresponds to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is: making the M/N data points and the I/O The probability that the M data corresponding to the response time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • a third aspect of the present invention provides an electronic device, including:
  • a memory for storing data used by the processor
  • the processor is configured to collect a set of N real-time data corresponding to N input/output I/O related indicators of the disk; wherein the N I/O related indicators include an I/O response time of the disk And an indicator that affects the I/O response time; the I/O response time is a time from when the application issues an operation request to when the disk receives the response to the request; N is greater than or equal to 2. Integer
  • the processor is further configured to determine, according to the N real-time data, whether the I/O response time is abnormal; the I/O response time abnormality indicates that the disk cannot operate normally; the I/O response time Normally, the disk can run the service normally; if the I/O response time is abnormal, the detection result is output, and the detection result is used to characterize the I/O response time abnormality.
  • the processor is configured to: determine a N corresponding to the remaining N-1 I/O related indicators except the I/O response time -1 real-time data is respectively in which of the at least two preset intervals of each of the remaining N-1 I/O related indicators, wherein the remaining N-1 The N-1 real-time data corresponding to the I/O related indicators are respectively in the N-1 preset intervals; the at least the I/O related indicators of the N-1 I/O related indicators are at least The two preset intervals are: dividing at least two sub-interval ranges obtained by the wide range between the first value and the second value that each of the I/O related indicators can support;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the processor is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional coordinate system The N dimension in the one-to-one correspondence with the N I/O related indicators respectively; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the clustering The center is: a center point obtained by performing clustering processing on the M N-dimensional data points corresponding to the N I/O related indicators collected before collecting the N real-time data; the predetermined distance is: The probability that the M distance values of the M N-dimensional data points are greater than the predetermined distance from the cluster center is smaller than the probability range that the user can accept; if the first distance is greater than the predetermined distance, determining the I/ O response time is abnormal.
  • the processor is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional coordinate system The N dimension in the one-to-one correspondence with the N I/O related indicators respectively; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the clustering The center is: corresponding to the N I/O related indicators collected before collecting the N real-time data a center point obtained by performing clustering processing on the M N-dimensional data points; the predetermined distance is: a probability that the M distance values of the M N-dimensional data points from the cluster center are greater than the predetermined distance a probability range that the user can accept; determining whether the real-time data corresponding to the I/O response time exceeds a preset I/O response time threshold; and the I/O response time threshold is: causing the M N-dimensional data points The probability that the M data corresponding to the I/O response time is
  • the processor is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional coordinate system The N dimension in the one-to-one correspondence with the N I/O related indicators respectively;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the processor is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional coordinate system The N dimension in the one-to-one correspondence with the N I/O related indicators respectively;
  • the cluster center is: the N pieces collected before collecting the N real-time data
  • the center point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators;
  • the predetermined distance is such that the probability that the M N-dimensional data points are greater than the predetermined distance from the cluster center is less than a probability range acceptable to the user;
  • the I/O response time threshold is: making the M/N data points and the I/O The probability that the M data corresponding to the response time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • a fourth aspect of the present invention provides a disk system, including:
  • a disk controller configured to collect a set of N real-time data corresponding to N input and output I/O related indicators of the disk; wherein the N I/O related indicators include an I/O response of the disk Time and an indicator affecting the I/O response time; the I/O response time is a time from when an application issues an operation request to when a response of the disk to the request is received; N is greater than or equal to 2 Integer
  • the disk controller is further configured to determine, according to the N real-time data, whether the I/O response time is abnormal; the I/O response time abnormality indicates that the disk cannot run a service normally; the I/O The normal response time indicates that the disk can run the service normally; if the I/O response time is abnormal, the detection result is output, and the detection result is used to characterize the I/O response time abnormality.
  • the disk controller is configured to: determine, corresponding to the remaining N-1 I/O related indicators except the I/O response time N-1 real-time data are respectively in which of the at least two preset intervals of each of the remaining N-1 I/O related indicators, wherein the remaining N- 1 I/O related indicator corresponds
  • the N-1 real-time data are respectively in the N-1 preset intervals
  • the at least two preset intervals of each of the N-1 I/O related indicators are: Dividing at least two subinterval ranges obtained by the wide range between the first value and the second value that each I/O related indicator can support;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the disk controller is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N-dimensional in the system respectively corresponds to the N I/O related indicators; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance;
  • the class center is: a center point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators collected before the collecting the N real-time data;
  • the predetermined distance is: Determining that the probability that the M distance values of the M N-dimensional data points are greater than the predetermined distance from the cluster center is smaller than the probability range that the user can accept; if the first distance is greater than the predetermined distance, determining the I The /O response time is abnormal.
  • the disk controller is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N-dimensional in the system respectively corresponds to the N I/O related indicators; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance;
  • the class center is: a center point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators collected before the collecting the N real-time data;
  • the predetermined distance is: Determining that the probability that the M distance values of the M N-dimensional data points are greater than the predetermined distance from the cluster center is smaller than the probability range that the user can accept; determining whether the real-time data corresponding to the I/O response time exceeds a preset I/O response time threshold;
  • the I/O response time threshold is: making the M N The probability that the M data corresponding to the I/O response time in the dimension data point is greater than the I/O response
  • the disk controller is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N dimensions in the system are respectively corresponding to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the disk controller is configured to: use the N real-time data as one data point in an N-dimensional coordinate system;
  • the N dimensions in the system are respectively corresponding to the N I/O related indicators;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is: making the M/N data points and the I/O Respond The probability that the M data corresponding to the time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • a fifth aspect of the present invention provides an electronic device, including:
  • a disk system according to any one of the fourth aspect, wherein the processor is configured to read and write data in the disk.
  • real-time data of the I/O response time of the disk and the index affecting the I/O response time are collected. Then, based on the real-time data of multiple indicators, it is determined whether the I/O response time is abnormal, and then it is possible to detect whether a slow disk is present. Further, the embodiment of the present invention refers to other I/O indicators that have an impact on the I/O response time indicator to determine whether the I/O response time is abnormal, so that the method is more closely related to the actual situation. Therefore, the method in the embodiment of the present invention is The test results are more accurate and can reduce false negatives and false positives.
  • FIG. 1 is a flowchart of a method for detecting a disk according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a specific example of a cluster center and a predetermined distance according to an embodiment of the present invention
  • FIG. 3 is a functional block diagram of an apparatus for detecting a magnetic disk according to an embodiment of the present invention
  • FIG. 4 is a structural block diagram of an electronic device according to an embodiment of the present invention.
  • Embodiments of the present invention provide a method and apparatus for detecting a disk to detect whether a slow disk is present.
  • FIG. 1 is a flowchart of a method for detecting a magnetic disk according to an embodiment of the present invention. As shown in Figure 1, the method includes the following:
  • Step 101 Collect a set of N real-time data corresponding to N I/O related indicators of the disk, wherein the N I/O related indicators include an I/O response time of the disk and affect the I/O response.
  • An indicator of time; the I/O response time is a time from when the application issues an operation request to when a response of the disk to the request is received; N is an integer greater than or equal to 2;
  • Step 102 Determine whether the I/O response time is abnormal according to the N real-time data; the abnormal I/O response time indicates that the disk cannot run the service normally; and the normal I/O response time indicates that the disk can run the service normally;
  • Step 103 If the I/O response time is abnormal, the detection result is output, and the detection result is used to represent the I/O response time abnormality.
  • the I/O related indicators that affect the I/O response time are, for example, indicators that reflect the number of I/Os, the size of the I/O, and the like.
  • the N I/O related indicators are, for example, indicators monitored by the iostat tool: the number of read requests per second combined in the queue (rrqm/s), the number of merge requests per second in the queue (wrqm/s), per second.
  • the number of N can be set according to actual conditions. It can usually be set with reference to the computational performance of the device performing the method shown in FIG. 1.
  • steps 101 to 103 can be performed periodically. For example, it can be set to execute every 3 seconds. Of course, it can be understood that steps 101 to 103 can also be performed according to the time point set by the user, for example, at 00:00 every day. Steps 101 to 103 may also be performed according to a trigger operation input by the user.
  • data of N I/O related indicators can be collected simultaneously at each collection time. For example, at the first moment, data of r/s and data of rsec/s are simultaneously acquired.
  • step 102 is performed to determine whether the I/O response time is abnormal according to the N real-time data.
  • step 102 there are various implementations of the step 102, which will be described in detail below.
  • the step 102 includes: determining that N-1 real-time data corresponding to the remaining N-1 I/O related indicators except the I/O response time are respectively located, and the remaining N- Which of the at least two preset intervals of each of the I/O related indicators of the I/O related indicators, wherein the remaining N-1 I/O related indicators correspond to N-1 real time
  • the data is respectively in the N-1 preset intervals; the at least two preset intervals of each of the N-1 I/O related indicators are: dividing each of the I The at least two subinterval ranges obtained by the large range between the first value and the second value supported by the /O related indicator; determining whether the real-time data corresponding to the I/O response time exceeds the N-1 presets
  • the I/O response time threshold corresponding to the combination of the intervals; the I/O response time threshold is less than or equal to: when the N-1 real-time data are respectively in the corresponding preset intervals, the disk can operate normally The maximum I/O response time value of the
  • each I/O related indicator can support the first The large range between a value and a second value is divided into at least two subinterval ranges.
  • N-1 indicators include indicator 1 and indicator 2.
  • the large range of indicator 1 is divided into three sub-interval ranges, which are three preset intervals, for example, preset interval 1, preset interval 2, and preset interval 3.
  • the large range of the indicator 2 is divided into three sub-interval ranges, which are respectively three preset intervals, which are also referred to as a preset interval 1, a preset interval 2, and a preset interval 3. See Table 1 for details.
  • the I/O response time threshold corresponding to the combination of the preset intervals of each indicator is less than or equal to: the maximum I/O response time value of the service that the disk can normally run when the N-1 real-time data are in the corresponding preset intervals.
  • the corresponding I/O response time threshold is the I/O response time threshold 7.
  • the I/O response time threshold 7 is less than or equal to: when the indicator 1 is in the preset interval 1 and the indicator 2 is in the preset interval 3, the maximum value of the normal running service of the disk.
  • the real-time data of the indicator 1 and the indicator 2 are collected, according to the real-time data corresponding to the index 1 and the indicator 2, it is determined which real-time data of the index 1 is in the preset interval of the index 1, and the real-time data of the index 2 is in the index 2 Which preset interval. Assume that the result of the determination is that the real-time data of the index 1 is in the preset interval 3 of the index 1, and the real-time data of the index 2 is in the preset interval 2 of the index 2, then the corresponding I/O response time threshold is the I/O response time threshold 6 .
  • the real-time data of the I/O response time collected in step 101 is compared with the I/O response time threshold. If the real-time data corresponding to the I/O response time exceeds the I/O response time threshold 6, it is determined. The I/O response time is abnormal.
  • Table 1 can be replaced by another form, as shown in Table 2.
  • each preset interval of the impact indicator corresponds to a level division, for example, divided into three levels: high, medium, and low.
  • the I/O response time threshold corresponding to the level combination of each impact indicator is also a level division, but the meaning of the I/O response time threshold is the same as described above.
  • step 102 includes: using the N real-time data as one data point in an N-dimensional coordinate system; and N-dimensional in the N-dimensional coordinate system and the N I/Os respectively Corresponding indicators one-to-one correspondence; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the cluster center is: before collecting the N real-time data a center point obtained by clustering the M N-dimensional data points corresponding to the collected N I/O related indicators; the predetermined distance is: causing the M N-dimensional data points to be away from the cluster center The probability that the M distance values are greater than the predetermined distance is less than the probability range that the user can accept; if the first distance is greater than the predetermined distance, determining that the I/O response time is abnormal.
  • M N-dimensional data points corresponding to N I/O related indicators of the disk are collected.
  • the N I/O related indicators are sampled M times, and each time a sample is collected for a set of N data corresponding to N I/O related indicators.
  • data of N I/O related indicators can be collected simultaneously at each sampling time. For example, at the first sampling instant, data of r/s and data of rsec/s are simultaneously acquired. If the N I/O related indicators are in one-to-one correspondence with the N-dimensional in the N-dimensional coordinate system, the N data collected at each sampling corresponds to an N-dimensional data point in the N-dimensional coordinate system, so After subsampling, N I/O related indicators are obtained. M N-dimensional data points.
  • the acquisition of M N-dimensional data points is performed while the disk is in normal operation.
  • the disk is not used yet, so it is close to the normal state under ideal conditions, so the baseline data can be collected at this time.
  • the disk is not a new disk, but has been running in the system for some time.
  • disks that have been running for a while may be worn out on the hardware, so there is some deviation from the normal state under ideal conditions.
  • M is an integer greater than or equal to 1.
  • the number of M can be set according to the actual situation. It can usually be set with reference to the computational performance of the device performing the method shown in FIG. 1.
  • the collected M N-dimensional data points may be processed next.
  • processing such as clustering algorithms and function fitting methods. Either way, the system automatically learns without manual setting, and can adapt to different environment configurations and business scenarios.
  • a clustering algorithm will be taken as an example for description.
  • the present invention does not limit which clustering algorithm is used, such as the k-means algorithm and the K-Medoids algorithm. Since the algorithm principles of the respective clustering algorithms are well known to those skilled in the art, the analysis process of each algorithm will not be described in detail, but it is convenient to understand the inventive content of the present invention. The following briefly describes how to use the clustering algorithm for M. N-dimensional data points are clustered.
  • the clustering center is obtained by performing clustering processing on the M N-dimensional data points.
  • each of the N I/O related indicators may correspond to a one dimension in an N-dimensional coordinate system.
  • the M data of each relevant indicator is the coordinate value in the corresponding dimension.
  • N 2
  • the first I/O related indicator corresponds to the x-axis.
  • the second I/O related indicator corresponds to the y-axis.
  • the five 2D data points corresponding to the first I/O related indicator and the second I/O related indicator are: data point (3, 4), data point (4, 5), and data point (4, 6), data points (6, 8) and data points (3, 2).
  • the five data of the first I/O related indicator are (3, 4, 4, 6, and 3).
  • the 5 data of the collected second I/O related indicators are (4, 5, 6, 8, 2). Then, the five data of the first I/O related indicator are averaged to obtain 4. The five data of the second I/O related indicator are averaged to obtain 5. Then the coordinate point (4, 5) is the cluster center.
  • step 101 more than one data point can be collected, that is, the value of M is larger, which can improve the accuracy of subsequent judgments.
  • the cluster center 201 is determined in the manner described above, and five data points are indicated in a two-dimensional coordinate system, such as the data points represented by reference numeral 202 in FIG.
  • a closed boundary 203 can be determined based on the distance values of the M data points from the cluster center 201.
  • the boundary 203 may be formed by connecting the outermost data points 202, or may be a circular boundary with a radius of the data point 202 from the largest value of the cluster center 201.
  • d in FIG. 2 can be taken as a predetermined distance.
  • the M distance values indicate that the M N-dimensional data points corresponding to the N I/O correlation indicators are mapped to values in the N-dimensional coordinate system.
  • the first distance value represents that N real-time data is mapped to a value in an N-dimensional coordinate system as one of the data points in an N-dimensional coordinate system.
  • the probability range that the user can accept refers to the maximum probability value that the user expects the probability that the M distance values are greater than the predetermined distance, for example, 5%. In other words, the user expects that at least 95% of the M distance values are less than the predetermined distance. Therefore, from a statistical point of view, the predetermined distance is set such that the probability that the M distance values are greater than the predetermined distance is smaller than the probability range that the user can accept, and it can be understood that if the predetermined distance is compared with the first distance, the first distance is greater than The probability of the predetermined distance is less than the probability range that the user can accept, that is, less than the abnormal probability that the user can tolerate.
  • the value of k is based on the principle that the probability that the M distance values are greater than the predetermined distance is less than the probability range that the user can accept.
  • the predetermined distance is determined by the following formula:
  • is a preset constant greater than 0 and less than 1
  • represents a probability that the M distance values are greater than Z ⁇
  • A is the first distance value
  • the values of k and ⁇ are: The probability that the distance value is greater than the predetermined distance value is less than the probability range that the user can accept.
  • step 102 the N real-time data in the real-time data points can be similarly substituted into the fitted function to determine whether the function can be met, and then It is judged that the I/O response time is abnormal.
  • N I/O related indicators are processed at the same time to obtain a reference value that can be used, and the real-time data of the N I/O related indicators are processed in the same manner, and then Compared with the reference value, it is judged whether the I/O response time is abnormal, so it is closer to the actual situation. Therefore, the method in the embodiment of the present invention can reduce false negatives and false positives.
  • the method further includes: when the first distance does not exceed the predetermined distance, the N real-time data points collected in step 101 may be considered as data under normal conditions, so The N real-time data points are combined with the M N-dimensional data points obtained in the foregoing steps, and clustering processing is performed again, thereby updating the cluster center and the predetermined distance.
  • the class center and the predetermined distance are more accurate, so when the judgment is made in the subsequent step 102, the more accurate the judgment result is, the misjudgment is not easily generated.
  • a third possible implementation manner is different from the second possible implementation manner, except that determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance is determined
  • the step 102 further includes: determining whether the real-time data corresponding to the I/O response time exceeds a preset I/O response time threshold; the I/O response time threshold is: causing the M N-dimensional data points to be The probability that the M data corresponding to the I/O response time is greater than the I/O response time threshold is smaller than a probability range that the user can accept; if the first distance is greater than the predetermined distance and the I/O If the real-time data corresponding to the response time exceeds the preset I/O response time threshold, it is determined that the I/O response time is abnormal.
  • M N-dimensional data points have been collected, and the M N-dimensional data points are obtained by M sampling, and each time sampling a set of N data corresponding to N I/O related indicators. Since the N I/O related indicators include the I/O response time, data of M I/O response times are collected after M sampling.
  • the preset I/O response time threshold is such that the probability that the I/O response time of the M data is greater than the I/O response time threshold is less than the probability range that the user can accept.
  • the manner of determining the I/O response time threshold may include, but is not limited to, the following methods.
  • the first one manual setting.
  • Second calculating the mean and standard deviation of the M data of the I/O response time; determining the second abnormal mean according to the mean and the standard deviation.
  • the mean value plus the product of the standard deviation and k is an I/O response time threshold
  • the value of k is: a probability that the M data is greater than the I/O response time threshold. Less than the range of probabilities the user can accept.
  • the method is that the system automatically learns the I/O response time threshold without manual setting, and can adapt to different environment configurations and service scenarios. Further, since the second threshold of learning is generally lower than the manually set threshold of the prior art, the slow disk can be found earlier, and the business is prevented from being damaged in advance.
  • the I/O response time threshold may be determined by: calculating two quantile values according to M data corresponding to the I/O response time, respectively referred to as a first quantile value and a second quantile value;
  • the I/O response time threshold is determined based on the first quantile value and the second quantile value. For example, let the first quantile value be a 25% quantile, denoted as Q1, and the second quantile value as a 75% quantile, denoted as Q2.
  • the I/O response time threshold can be calculated by the following formula: Q3+k*(Q3-Q1).
  • the value of k is based on the principle that the probability that the M data is greater than the I/O response time threshold is less than the probability range that the user can accept.
  • This method also belongs to the system automatic learning I / O response time threshold.
  • the first quantile value and the second quantile value are calculated according to the M data, and are not known to those skilled in the art.
  • the fourth possible implementation manner is different from the second possible implementation manner, except that determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance is determined
  • the step 102 further includes: determining whether the real-time data corresponding to the indicator for characterizing the load size in the N I/O related indicators exceeds a preset load range; the preset load range is a minimum load that the disk can support a full range or a partial range between the maximum load; determining the I if the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator representing the load size is within the preset load range
  • the /O response time is abnormal.
  • the manner of determining the preset load range may include, but is not limited to, the following.
  • the preset load range is manually set.
  • the N I/O related indicators include indicators that characterize the load size (eg, r/s, w/s, rsec/s, wsec/s), so in the foregoing steps M data of indicators characterization of the load size are collected, or N I/O related indicators do not include indicators for characterizing the load size, so M data representing indicators of the load size are additionally collected.
  • M data representing indicators of the load size are additionally collected.
  • the first range between the minimum data and the maximum data among the M data is a preset range.
  • the second range is a preset range, wherein the second range is included in the first range.
  • the first range is the range between the minimum load and the maximum load that the disk can support.
  • the step 102 includes: using the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system is respectively corresponding to the N I/O related indicators.
  • the method for determining the preset load range may be the same as the foregoing manner.
  • the predetermined distance, the cluster center, and the I/O response time threshold are the same as those in the foregoing embodiment, and therefore are not described herein again.
  • the detection result is output in step 103, for example, by printing a log, an alarm, an interface display, and reporting to a processing module.
  • a processing module for example, by printing a log, an alarm, an interface display, and reporting to a processing module.
  • the user or the processing module is notified, and in the case of a slow disk, the user or the processing module can take measures such as isolating the disk.
  • FIG. 3 a functional block diagram of an apparatus for detecting a magnetic disk according to an embodiment of the present application is provided for implementing the method for detecting a magnetic disk shown in FIG. 1 to FIG.
  • the meaning of the terms involved in this embodiment please refer to the content described in the foregoing embodiments.
  • the device for detecting a disk includes: a data collection unit 301, configured to collect a set of N real-time data corresponding to N input and output I/O related indicators of the disk; wherein the N I/O related indicators include the I/O response time of the disk and an indicator affecting the I/O response time; the I/O response time is an operation request sent from the application The time from the start of receiving the response of the disk to the request; N is an integer greater than or equal to 2; the processing unit 302 is configured to determine, according to the N real-time data, whether the I/O response time is abnormal; The abnormal I/O response time indicates that the disk cannot run the service normally; the normal I/O response time indicates that the disk can run the service normally; if the I/O response time is abnormal, the detection result is output. The detection result is used to characterize the I/O response time anomaly.
  • the processing unit 302 is configured to: determine that N-1 real-time data corresponding to the remaining N-1 I/O related indicators except the I/O response time are respectively located, and the remaining N-1 Which of the at least two preset intervals of each I/O related indicator in the I/O related indicator, wherein the remaining N-1 I/O related indicators correspond to N-1 real-time data respectively And being in the N-1 preset intervals; the at least two preset intervals of each of the N-1 I/O related indicators are: dividing each of the I/Os The range of at least two subintervals obtained by the correlation between the first value and the second value that the related indicator can support;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the processing unit 302 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively a one-to-one correspondence; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the cluster center is: collecting before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators; the predetermined distance is: causing the M N-dimensional data points to be M from the cluster center The probability that the distance value is greater than the predetermined distance is less than a probability range that the user can accept; if the first distance is greater than the predetermined distance, determining that the I/O response time is abnormal.
  • the processing unit 302 is configured to: use the N real-time data as an N-dimensional coordinate system. a data point; the N dimension in the N-dimensional coordinate system respectively corresponds to the N I/O correlation indicators; determining that the data point is away from a cluster center in the N-dimensional coordinate system Whether a distance is greater than a predetermined distance; the cluster center is: a center obtained by clustering M M-dimensional data points corresponding to the N I/O related indicators collected before collecting the N real-time data a predetermined distance: a probability that the M distance values of the M N-dimensional data points from the cluster center are greater than the predetermined distance is less than a probability range acceptable to the user; and determining the I/O response Whether the time-corresponding real-time data exceeds a preset I/O response time threshold; the I/O response time threshold is: causing M data corresponding to the I/O response time among the M N-dimensional data points The probability that the greater than the I/O response time threshold is greater than the probability range that the user can accept
  • the processing unit 302 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively One-to-one correspondence;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the processing unit 302 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively One-to-one correspondence;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is: making the M/N data points and the I/O The probability that the M data corresponding to the response time is greater than the I/O response time threshold is smaller than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • the electronic device includes a processor 401, a transmitter 402, a receiver 403, a memory 404, and an I/O interface 405.
  • the processor 401 may be a general-purpose central processing unit (CPU), may be an application specific integrated circuit (ASIC), and may be one or more integrated circuits for controlling program execution.
  • the number of memories 404 can be one or more.
  • the memory 404, the receiver 403, and the transmitter 402 are connected to the processor 401 via a bus.
  • Receiver 403 And the transmitter 402 is configured to perform network communication with an external device, and specifically can communicate with an external device through a network such as an Ethernet, a wireless access network, or a wireless local area network.
  • Receiver 403 and transmitter 402 may be physically separate components or may be physically identical components.
  • the I/O interface 405 can be used to connect peripherals such as a mouse and a keyboard.
  • the electronic device is installed with a disk or a connected disk, such as a hard disk, a USB disk, or a disk array.
  • a disk or a connected disk such as a hard disk, a USB disk, or a disk array.
  • the electronic device may be a user side device or a network side device.
  • the memory 404 is configured to store data used by the processor 401;
  • the processor 401 is configured to collect a set of N real-time data corresponding to the N input/output I/O related indicators of the disk, where the N I/O related indicators include an I/O response time of the disk and An indicator that affects the I/O response time; the I/O response time is a time from when an application issues an operation request to when a response of the disk to the request is received; N is an integer greater than or equal to 2. .
  • the processor 401 is further configured to determine, according to the N real-time data, whether the I/O response time is abnormal; the I/O response time abnormality indicates that the disk cannot run a service normally; and the I/O response time is normal. Indicates that the disk can run the service normally; if the I/O response time is abnormal, the detection result is output, and the detection result is used to represent the I/O response time abnormality.
  • the processor 401 is configured to: determine that N-1 real-time data corresponding to the remaining N-1 I/O related indicators except the I/O response time are respectively located, and the remaining N-1 Which of the at least two preset intervals of each I/O related indicator in the I/O related indicator, wherein the remaining N-1 I/O related indicators correspond to N-1 real-time data respectively And being in the N-1 preset intervals; the at least two preset intervals of each of the N-1 I/O related indicators are: dividing each of the I/Os The range of at least two subintervals obtained by the correlation between the first value and the second value that the related indicator can support;
  • the I/O response time threshold is less than or equal to: the N - a maximum I/O response time value of the normal operation of the disk when the real-time data is in the corresponding preset interval;
  • the real-time data corresponding to the I/O response time exceeds the I/O response time threshold, it is determined that the I/O response time is abnormal.
  • the processor 401 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively a one-to-one correspondence; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the cluster center is: collecting before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators; the predetermined distance is: causing the M N-dimensional data points to be M from the cluster center The probability that the distance value is greater than the predetermined distance is less than a probability range that the user can accept; if the first distance is greater than the predetermined distance, determining that the I/O response time is abnormal.
  • the processor 401 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively a one-to-one correspondence; determining whether the first distance of the data point from a cluster center in the N-dimensional coordinate system is greater than a predetermined distance; the cluster center is: collecting before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the N I/O related indicators; the predetermined distance is: causing the M N-dimensional data points to be M from the cluster center The probability that the distance value is greater than the predetermined distance is less than the probability range that the user can accept; determining whether the real-time data corresponding to the I/O response time exceeds a preset I/O response time threshold; the I/O response time threshold is : causing a probability that the M data corresponding to the I/O response time in the M N-dimensional data points is greater than the I/O response time threshold is smaller
  • the processor 401 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively One-to-one correspondence;
  • the cluster center is: the N pieces collected before collecting the N real-time data a center point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicator; the predetermined The distance is such that the probability that the M N-dimensional data points are greater than the predetermined distance from the cluster center is less than a probability range acceptable to the user;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the processor 401 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively One-to-one correspondence;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the preset load range is between a minimum load and a maximum load that the disk can support Full range or partial range;
  • the first distance is greater than the predetermined distance and the real-time data corresponding to the indicator of the load size is located within the preset load range, determining that the I/O response time is abnormal.
  • the processor 401 is configured to: use the N real-time data as one data point in an N-dimensional coordinate system; the N-dimensional in the N-dimensional coordinate system and the N I/O related indicators respectively One-to-one correspondence;
  • the cluster center is: the N pieces collected before collecting the N real-time data a central point obtained by clustering the M N-dimensional data points corresponding to the I/O related indicators
  • the predetermined distance is: the M distance values of the M N-dimensional data points from the cluster center are greater than The probability of describing the predetermined distance is less than the probability range that the user can accept;
  • the I/O response time threshold is such that a probability that the M data corresponding to the I/O response time in the M N-dimensional data points is greater than the I/O response time threshold is less than a user acceptable Probability range
  • the preset load range is between a minimum load and a maximum load that the disk can support The full range or part of the scope
  • the real-time data corresponding to the I/O response time exceeds the preset I/O response time threshold, and the real-time data corresponding to the indicator that represents the load size is located in the Within the preset load range, it is determined that the I/O response time is abnormal.
  • an embodiment of the present invention further provides a disk system including a disk and a disk controller.
  • the disk controller is used to perform the foregoing description of the method in FIG. 1 and its embodiments. For details, refer to the foregoing description of FIG. 1 and its embodiments, and details are not described herein.
  • an embodiment of the present invention further provides an electronic device.
  • the structure of the electronic device is as shown in FIG. 4, except that the electronic device in this embodiment further includes the disk system.
  • real-time data of the I/O response time of the disk and the index affecting the I/O response time are collected. Then, based on the real-time data of multiple indicators, it is determined whether the I/O response time is abnormal, and then it is possible to detect whether a slow disk is present. Further, the embodiment of the present invention refers to other I/O indicators that have an impact on the I/O response time indicator to determine whether the I/O response time is abnormal, so that the method is more closely related to the actual situation. Therefore, the method in the embodiment of the present invention is The test results are more accurate and can reduce false negatives and false positives.
  • embodiments of the present invention may be provided as a method, system, or Computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种检测磁盘的方法及装置,该方法包括:采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据(步骤101);其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;根据所述N个实时数据确定所述I/O响应时间是否异常(步骤102);所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常(步骤103)。

Description

一种检测磁盘的方法及装置
本申请要求在2015年07月31日提交中国专利局、申请号为201510465856.0、发明名称为“一种检测磁盘的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及电子技术领域,尤其涉及一种检测磁盘的方法及装置。
背景技术
磁盘由于磁性退化、坏道、振动或其它机械和环境问题,会导致磁盘对读写输入输出(英文,Input/Output,简称:I/O)请求的响应时间变长。这种对I/O请求响应过慢的磁盘称为慢盘。
慢盘是存储系统可靠性的一个重要威胁。特别是对于独立磁盘构成的具有冗余能力的阵列(英文:Redundant Arrays of Inexpensive Disks,简称:RAID)和分布式存储系统,一块磁盘变慢可能导致全系统性能下降,严重时甚至导致业务中断。因此,需要对磁盘进行检测,以便及时采取相应的处理措施,如对慢盘进行隔离和数据备份。
发明内容
本发明实施例提供一种检测磁盘的方法及装置,用以检测是否出现慢盘。
本发明第一方面提供了一种检测磁盘的方法,包括:
采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁 盘能够正常运行业务;
若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
结合第一方面,在第一方面的第一种可能的实现方式中,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第一方面,在第一方面的第二种可能的实现方式中,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
结合第一方面,在第一方面的第三种可能的实现方式中,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第一方面,在第一方面的第四种可能的实现方式中,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
结合第一方面,在第一方面的第五种可能的实现方式中,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
本发明第二方面提供一种检测磁盘的装置,包括:
数据采集单元,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
处理单元,用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
结合第二方面,在第二方面的第一种可能的实现方式中,所述处理单元用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第二方面,在第二方面的第二种可能的实现方式中,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
结合第二方面,在第二方面的第三种可能的实现方式中,所述处理单元 用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第二方面,在第二方面的第四种可能的实现方式中,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
结合第二方面,在第二方面的第五种可能的实现方式中,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维 坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
本发明第三方面提供一种电子设备,包括:
存储器,用于存储处理器所使用的数据;
所述处理器,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
所述处理器还用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
结合第三方面,在第三方面的第一种可能的实现方式中,所述处理器用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第三方面,在第三方面的第二种可能的实现方式中,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
结合第三方面,在第三方面的第三种可能的实现方式中,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应 的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第三方面,在第三方面的第四种可能的实现方式中,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
结合第三方面,在第三方面的第五种可能的实现方式中,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所 述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
本发明第四方面提供一种磁盘系统,包括:
磁盘;
磁盘控制器,用于采集所述磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
所述磁盘控制器,还用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
结合第四方面,在第四方面的第一种可能的实现方式中,所述磁盘控制器用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应 的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第四方面,在第四方面的第二种可能的实现方式中,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
结合第四方面,在第四方面的第三种可能的实现方式中,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N 维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
结合第四方面,在第四方面的第四种可能的实现方式中,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
结合第四方面,在第四方面的第五种可能的实现方式中,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应 时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
本发明第五方面提供一种电子设备,包括:
如第四方面至第四方面的第五种可能的实现方式中的任一种所述的磁盘系统;处理器,用于读写所述磁盘中的数据。
本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:
本发明实施例中,采集磁盘的I/O响应时间及影响所述I/O响应时间的指标的实时数据。然后根据多个指标的实时数据确定I/O响应时间是否异常,进而可以检测出是否出现慢盘。进一步,本发明实施例同时参考其它对I/O响应时间指标有影响的I/O指标,来确定I/O响应时间是否异常,所以更加贴近实际情况,因此,本发明实施例中的方法的检测结果比较准确,可以减少漏报和误报。
附图说明
图1为本发明实施例提供的一种检测磁盘的方法的流程图;
图2为本发明实施例提供的一种聚类中心及预定距离的具体实例示意图;
图3为本发明实施例提供的一种检测磁盘的装置的功能框图;
图4为本发明实施例提供的一种电子设备的结构框图。
具体实施方式
本发明实施例提供一种检测磁盘的方法及装置,用以检测是否出现慢盘。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
请参考图1,为本发明一实施例提供的一种检测磁盘的方法的流程图。如图1所示,该方法包括以下内容:
步骤101:采集磁盘的N个I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
步骤102:根据N个实时数据确定I/O响应时间是否异常;I/O响应时间异常表示磁盘不能够正常运行业务;I/O响应时间正常表示磁盘能够正常运行业务;
步骤103:若I/O响应时间异常,则输出检测结果,所述检测结果用于表征I/O响应时间异常。
其中,影响I/O响应时间的I/O相关指标例如为反映I/O数量、I/O大小等情况的指标。N个I/O相关指标例如为,iostat工具监控的指标:队列中每秒钟合并的读请求数量(rrqm/s)、队列中每秒钟合并的写请求数量(wrqm/s)、每秒完成的读请求数量(r/s)、每秒完成的写请求数量(w/s)、每秒读取的扇 区数量(rsec/s)、每秒写入的扇区数量(wsec/s)、平均请求数据的大小(avgrq-sz)、平均请求队列的长度(avgqu-sz)、平均每次请求的等待时间(await)、平均每次请求的服务时间(svctm)、设备的利用率(util)中的至少两种指标。
具体来说,N的数量可以根据实际情况进行设置。通常可以参考执行图1所示方法的设备的计算性能来设置。
在实际运用中,步骤101至步骤103可以周期性的进行。例如可以设置每3秒执行一次。当然,可以理解的是,步骤101至步骤103也可以是根据用户设置的时间点来执行,例如在每天00:00执行。步骤101至步骤103也可以是根据用户输入的触发操作来执行。
具体的,每个采集时刻可以同时采集N个I/O相关指标的数据。举例来说,例如在第一时刻,同时采集r/s的数据以及rsec/s的数据。
不管通过哪种采集方式,在一个采集周期采集了一组N个实时数据之后,接下来执行步骤102,即根据N个实时数据确定I/O响应时间是否异常。在具体实施过程中,步骤102的具体实现方式有多种,以下将进行详细描述。
第一种可能的实现方式,步骤102包括:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
具体的,针对其余N-1个I/O相关指标,将每个I/O相关指标能够支持的第 一值与第二值之间的大范围划分为至少两个子区间范围。举例来说,N-1个指标包括指标1和指标2。指标1的大范围被划分为三个子区间范围,分别为三个预设区间,例如为预设区间1、预设区间2和预设区间3。类似的,指标2的大范围被划分为三个子区间范围,分别为三个预设区间,也称为预设区间1、预设区间2和预设区间3。具体见表一所示。
Figure PCTCN2016080376-appb-000001
表一
每个指标的预设区间的组合对应的I/O响应时间阈值小于或等于:N-1个实时数据处于各自对应的预设区间时,磁盘能够正常运行业务的最大I/O响应时间值。举例来说,当指标1处于预设区间1、且指标2处于预设区间3时,对应的I/O响应时间阈值为I/O响应时间阈值7。I/O响应时间阈值7小于或等于:当指标1处于预设区间1、且指标2处于预设区间3时,磁盘能够正常运行业务的最大值。
因此,在采集到指标1和指标2的实时数据时,根据指标1和指标2对应的实时数据确定指标1的实时数据处于指标1的哪个预设区间,以及指标2的实时数据处于指标2的哪个预设区间。假设确定的结果是指标1的实时数据处于指标1的预设区间3,指标2的实时数据处于指标2的预设区间2,那么对应的I/O响应时间阈值为I/O响应时间阈值6。
接下来,将步骤101中采集的I/O响应时间的实时数据与I/O响应时间阈值进行比较,若所述I/O响应时间对应的实时数据超过I/O响应时间阈值6,则确定所述I/O响应时间异常。
在实际运用中,表一还可以用另一种形式代替,例如表二所示。
Figure PCTCN2016080376-appb-000002
表二
在本例中,影响指标的每个预设区间对应到一个等级划分,例如分为:高、中、低三个等级。每个影响指标的等级组合对应的I/O响应时间阈值也是一个等级划分,不过I/O响应时间阈值的含义与前述相同。
由以上描述可以看出,第一种可能的实现方式中,综合考虑其它N-1个指标对I/O响应时间的影响,才确定I/O响应时间是否异常,相比单独只考虑I/O响应时间,本实施例中的方法的判断结果更加准确。
第二种可能的实现方式,步骤102包括:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
具体的,在步骤101之前,采集磁盘的N个I/O相关指标对应的M个N维数据点。具体来说,是对N个I/O相关指标进行M次采样,每次采样采集N个I/O相关指标对应的一组N个数据。在实际运用中,每个采样时刻可以同时采集N个I/O相关指标的数据。举例来说,例如在第一采样时刻,同时采集r/s的数据以及rsec/s的数据。如果将N个I/O相关指标与N维坐标系中的N维分别一一对应,那么每一次采样时采集的N个数据便对应N维坐标系中的一个N维数据点,所以经过M次采样,便获得了N个I/O相关指标对应 的M个N维数据点。
具体的,在理想状况下,是在磁盘正常运行时进行M个N维数据点的采集。例如一个新的磁盘刚安装到一个系统中时,此时因为磁盘还未使用,所以比较接近理想状况下的正常状态,所以此时可以进行基准数据的采集。然而,在实际使用中,很多情况下,磁盘都不是新磁盘,而是在系统中已经运行一段时间了。相比新磁盘,运行了一段时间的磁盘可能在硬件上已经有磨损了,所以离理想状况下的正常状态已经有一些偏离。此时可以通过人工结合磁盘上承载的业务运行情况来判断磁盘是不是运行正常。如果人工判断磁盘运行正常,没有明显变慢,此时即可进行基准数据的采集。
具体来说,M为大于或等于1的整数。M的数量可以根据实际情况进行设置。通常可以参考执行图1所示方法的设备的计算性能来设置。
当采集到N个I/O相关指标对应的M个N维数据点之后,接下来可以对采集的M个N维数据点进行处理。在实际运用中,处理的具体实施方式有多种,例如聚类算法、函数拟合法。不管哪种方式,均为系统自动学习,无需人工设定,且可自适应不同环境配置和业务场景。
在本实施例中,将以聚类算法为例进行说明。其中,本发明不限定使用哪种聚类算法,例如k-means算法、K-Medoids算法均可。因为各个聚类算法的算法原理为本领域技术人员所熟知的内容,所以将不详细说明各个算法的分析过程,但便于理解本发明的发明内容,以下将简要介绍如何使用聚类算法对M个N维数据点进行聚类分析。
具体的,在一种可能的实现方式中,通过对M个N维数据点进行聚类处理,得到聚类中心。
举例来说,N个I/O相关指标中的每个相关指标可对应于N维坐标系中的一维。每个相关指标的M个数据即为对应维度上的坐标值。便于说明,假设N为2,可以对应一个二维直角坐标系。第一I/O相关指标对应x轴。第二I/O相关指标对应y轴。假设采集了第一I/O相关指标和第二I/O相关指标对应的5个2维数据点分别为:数据点(3,4)、数据点(4,5)、数据点(4, 6)、数据点(6,8)和数据点(3,2)。对应的,第一I/O相关指标的5个数据为(3、4、4、6、3)。采集的第二I/O相关指标的5个数据为(4、5、6、8、2)。然后将第一I/O相关指标的5个数据求平均值得到4。将第二I/O相关指标的5个数据求平均值得到5。那么坐标点(4,5)即为聚类中心。
在实际运用中,在步骤101中,可以采集较多个数据点,即M的值较大,这样可以提高后续判断的准确性。如图2所示,通过前述描述的方式确定出聚类中心201,并且将5个数据点标示在二维坐标系中,如图2中标号202代表的数据点。根据M个数据点距离聚类中心201的距离值即可确定出一个闭合的边界203。边界203可以是连接最外围的数据点202形成的,也可以是以数据点202距离聚类中心201最大的值为半径的圆形边界。在实际运用中,可以将如图2中的d作为预定距离。d使得M个N维数据点距离聚类中心的M个距离值大于预定距离的概率小于用户能够接受的概率范围。其中,M个距离值表示N个I/O相关指标对应的M个N维数据点映射到N维坐标系中的值。第一距离值表示N个实时数据作为一个N维坐标系中的一个数据点映射到N维坐标系中的值。
其中,用户能够接受的概率范围是指用户期望M个距离值大于预定距离的概率能够达到的最大概率值,例如5%。换言之,用户期望M个距离值中至少有95%的距离值是小于预定距离的。因此,从统计学的角度来讲,预定距离的设置使得M个距离值大于预定距离的概率小于用户能够接受的概率范围,可以理解为利用预定距离和第一距离来比较的话,第一距离大于预定距离的概率是小于用户能够接受的概率范围,即小于用户能够容忍的异常概率。
举例来说,请再参考图2所示。假设图2中标号204表示的是N个实时数据形成的一个数据点时,因为数据点204距离聚类中心201的距离小于d,所以实时数据点204未超过预定距离,所以可以确定I/O响应时间正常。假设图2中标号205表示实时的数据点时,因为实时的数据点205距离聚类中心201的距离大于d,所以实时的数据点205超过预定距离,所以确定I/O响应时间异常。
具体的,预定距离的确定方法也可以有多种,以下将举例说明。
第一种,计算所述M个距离值的均值和标准差;根据所述均值和所述标准差确定所述预定距离,所述预定距离为所述均值和所述标准差的k倍之和;k的取值原则为:使得所述M个距离值大于所述预定距离的概率小于用户能够接受的概率范围。
第二种,计算M个距离值的第一分位值Q1和第二分位值Q3;根据第一分位值Q1和第二分位值Q3确定所述预定距离;所述预定距离为Q3+k*(Q3-Q1),其中,k的取值原则为:使得M个距离值大于所述预定距离的概率小于用户能够接受的概率范围。
第三种,通过以下公式确定所述预定距离:
A=k*Zα,
其中,α为预设的大于0且小于1的常数,α表示所述M个距离值大于Zα的概率,A为所述第一距离值,k和α的取值原则为:使得所述M个距离值大于所述预定距离值的概率小于用户能够接受的概率范围。
因此,当所述数据点距离聚类中心的第一距离大于预定距离时,就表示I/O响应时间异常。
如果对M个N维数据点的处理方式是拟合一个函数,那么在步骤102中,可以同样将实时数据点中的N个实时数据代入拟合的函数,判断是否可以符合该函数,进而来判断所述I/O响应时间异常。
由以上描述可以看出,因为本发明实施例同时将N个I/O相关指标进行处理得到可以依据的参考值,并且将N个I/O相关指标的实时数据再做同样的处理,进而再与参考值进行比较,以此判断I/O响应时间是否异常,所以更加贴近实际情况,因此,本发明实施例中的方法可以减少漏报和误报。
可选的,在步骤102之后,该方法还包括:当第一距离未超过所述预定距离时,可以认为在步骤101中采集的N个实时数据点为正常情况的下的数据,所以可以将N个实时数据点结合前述步骤中获得的M个N维数据点,再次进行聚类处理,进而更新聚类中心和预定距离。随着更新次数的增加,聚 类中心和预定距离会更加准确,所以在后续步骤102中进行判断时,判断的结果就越准确,不容易产生误判。
第三种可能的实现方式,与第二种可能的实现方式不同的是,除了判断判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离之外,步骤102还包括:判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
因此,可以同时考虑N个I/O相关指标和I/O响应时间来判断I/O响应时间是否异常,使得判断的结果更加准确,进而进一步降低误报率和漏报率。
具体的,前述步骤中已采集M个N维数据点,M个N维数据点为由M次采样,每次采样与N个I/O相关指标对应的一组N个数据得到的。因为N个I/O相关指标包括I/O响应时间,所以经过M次采样之后,就会采集到M个I/O响应时间的数据。而预设的I/O响应时间阈值为使得I/O响应时间的M个数据中大于I/O响应时间阈值的概率小于用户能够接受的概率范围。
其中,本实施例中用户能够接受的概率范围的含义与前述类似,所以在此不再赘述。
关于I/O响应时间阈值的确定方式,可以包括但不限于以下几种方式。第一种,人工设置。第二种,计算I/O响应时间的M个数据的均值和标准差;根据所述均值和所述标准差确定第二异常均值。举例来说,所述均值加上所述标准差和k的乘积即为I/O响应时间阈值,k的取值原则为:使得所述M个数据大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围。该种方法为系统自动学习I/O响应时间阈值,无需人工设定,且可自适应不同环境配置和业务场景。进一步,由于学习的第二阈值通常会比现有技术人工设定的阈值更低,因此可以更早发现慢盘,提前规避业务受损。
第三种,可以通过以下步骤确定I/O响应时间阈值:根据I/O响应时间对应的M个数据,计算两个分位值,分别称为第一分位值和第二分位值;根据第一分位值和第二分位值确定I/O响应时间阈值。举例来说,设第一分位值为25%分位数,记为Q1,第二分位值为75%分位数,记为Q2。那么I/O响应时间阈值可以通过如下公式计算:Q3+k*(Q3-Q1)。其中,k的取值原则为:使得所述M个数据大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围。该方法也属于系统自动学习I/O响应时间阈值。其中,根据M个数据计算第一分位值和第二分位值为本领域技术人员所熟知的内容,所以在此不再赘述。
第四种,通过以下公式确定所述I/O响应时间阈值:A=k*Zα,α为预设的大于0且小于1的常数,α表示所述M个数据大于Zα的概率,A为所述第一距离值,k和α的取值原则为:使得所述M个数据大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围。具体来说,通过α查询标准正态分布表,即可获得Zα的值,进而获得I/O响应时间阈值。当然也可以是通过α计算出Zα。
第四种可能的实现方式,与第二种可能的实现方式不同的是,除了判断判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离之外,步骤102还包括:判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
因为负载对磁盘的I/O响应时间是有影响的,所以可以同时考虑N个I/O相关指标以及负载来判断I/O响应时间是否发生异常,使得判断的结果更加准确,进而进一步减少误报和漏报。
在具体实施过程中,预设的负载范围的确定方式可以包括但不限于以下几种。第一种,预设的负载范围由人工设置。第二种,N个I/O相关指标中包括表征负载大小的指标(例如r/s,w/s,rsec/s,wsec/s),所以在前述步骤中已经 采集了表征负载大小的指标的M个数据,或者是N个I/O相关指标未包括表征负载大小的指标,所以就另外采集表征负载大小的指标的M个数据。可以确定M个数据中最小数据和最大数据之间的第一范围为预设范围。也可以确定第二范围为预设范围,其中,第二范围包含于第一范围。当然,第一范围为磁盘能够支持的最小负载与最大负载之间的范围。
第五种可能的实现方式,在本实施例中,同时考虑N个I/O相关指标、I/O响应时间以及负载三个条件来判断I/O响应时间是否发生异常,使得最终的判断结果更加准确。具体的,步骤102包括:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
其中,预设的负载范围的确定方式与前述的方式可以相同,预定距离、聚类中心、I/O响应时间阈值与前述实施例中的含义相同,所以在此不再赘述。
具体的,在步骤103输出检测结果,举例来说,可以通过打印日志、告警、界面展示、报告给处理模块的方式进行输出。以此通知用户或者处理模块,出现慢盘的情况,用户或者处理模块可以采取隔离所述磁盘等措施。
基于同一发明构思,参考图3所示,为本申请实施例提供的一种检测磁盘的装置的功能框图,用于实现本发明图1至图2所示的检测磁盘的方法。本实施例中所涉及术语的含义请参考前述实施例中所描述的内容。该检测磁盘的装置包括:数据采集单元301,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求 开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;处理单元302,用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
可选的,处理单元302用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
可选的,处理单元302用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
可选的,处理单元302用于:将所述N个实时数据作为一个N维坐标系中 的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
可选的,处理单元302用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
可选的,处理单元302用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
前述图1至图2实施例中的检测磁盘的方法中的各种变化方式和具体实例同样适用于本实施例的检测磁盘的装置,通过前述对检测磁盘的方法的详细描述,本领域技术人员可以清楚的知道本实施例中检测磁盘的装置的实施方法,所以为了说明书的简洁,在此不再详述。
基于同一发明构思,参考图4所示,为本申请实施例提供的一种电子设备的结构框图,用于实现本发明图1至图2所示的检测磁盘的方法,本实施例中所涉及的术语的含义请参考前述实施例中所描述的内容。该电子设备包括:处理器401、发送器402、接收器403、存储器404和I/O接口405。处理器401具体可以是通用的中央处理器(CPU),可以是特定应用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC),可以是一个或多个用于控制程序执行的集成电路。存储器404的数量可以是一个或多个。存储器404、接收器403和发送器402通过总线与处理器401相连接。接收器403 和发送器402用于与外部设备进行网络通信,具体可以通过以太网、无线接入网、无线局域网等网络与外部设备进行通信。接收器403和发送器402可以是物理上相互独立的两个元件,也可以是物理上的同一个元件。I/O接口405可以用于连接鼠标、键盘等外设。
具体的,电子设备安装有磁盘或者连接磁盘,例如硬盘、U盘或者磁盘阵列。
具体的,电子设备可以是用户侧设备,也可以是网络侧设备。
具体来说,存储器404,用于存储处理器401所使用的数据;
处理器401,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数。
处理器401还用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
可选的,处理器401用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
可选的,处理器401用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
可选的,处理器401用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
可选的,处理器401用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定 距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
可选的,处理器401用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
可选的,处理器401用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值; 所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
前述图1至图2实施例中的检测磁盘的方法中的各种变化方式和具体实例同样适用于本实施例的电子设备,通过前述对检测磁盘的方法的详细描述,本领域技术人员可以清楚的知道本实施例中电子设备的实施方法,所以为了说明书的简洁,在此不再详述。
基于同一发明构思,本发明实施例还提供一种磁盘系统,包括磁盘和磁盘控制器。其中,磁盘控制器用于执行前述图1及其实施例所描述的方法,具体请参考前述对图1及其实施例的描述,在此不再赘述。
进一步,本发明实施例还提供一种电子设备,该电子设备的结构请参考图4所示,不同的是,本实施例中的电子设备还包括所述磁盘系统。
本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:
本发明实施例中,采集磁盘的I/O响应时间及影响所述I/O响应时间的指标的实时数据。然后根据多个指标的实时数据确定I/O响应时间是否异常,进而可以检测出是否出现慢盘。进一步,本发明实施例同时参考其它对I/O响应时间指标有影响的I/O指标,来确定I/O响应时间是否异常,所以更加贴近实际情况,因此,本发明实施例中的方法的检测结果比较准确,可以减少漏报和误报。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或 计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (25)

  1. 一种检测磁盘的方法,其特征在于,包括:
    采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
    根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;
    若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
    确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
    判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
    若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
  3. 如权利要求1所述的方法,其特征在于,所述根据所述N个实时数据确 定所述I/O响应时间是否异常,包括:
    将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
  4. 如权利要求1所述的方法,其特征在于,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
    将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
    若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
  5. 如权利要求1所述的方法,其特征在于,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
    将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标 系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  6. 如权利要求1所述的方法,其特征在于,所述根据所述N个实时数据确定所述I/O响应时间是否异常,包括:
    将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  7. 一种检测磁盘的装置,其特征在于,包括:
    数据采集单元,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
    处理单元,用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
  8. 如权利要求7所述的装置,其特征在于,所述处理单元用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
    判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
    若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
  9. 如权利要求7所述的装置,其特征在于,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分 别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
  10. 如权利要求7所述的装置,其特征在于,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
  11. 如权利要求7所述的装置,其特征在于,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超 过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  12. 如权利要求11所述的装置,其特征在于,所述处理单元用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  13. 一种电子设备,其特征在于,包括:
    存储器,用于存储处理器所使用的数据;
    所述处理器,用于采集磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到 所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
    所述处理器还用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
  14. 如权利要求13所述的电子设备,其特征在于,所述处理器用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
    判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
    若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
  15. 如权利要求13所述的电子设备,其特征在于,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
  16. 如权利要求13所述的电子设备,其特征在于,所述处理器用于:将 所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
  17. 如权利要求13所述的电子设备,其特征在于,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  18. 如权利要求13所述的电子设备,其特征在于,所述处理器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  19. 一种磁盘系统,其特征在于,包括:
    磁盘;
    磁盘控制器,用于采集所述磁盘的N个输入输出I/O相关指标一一对应的一组N个实时数据;其中所述N个I/O相关指标包括所述磁盘的I/O响应时间及影响所述I/O响应时间的指标;所述I/O响应时间为从应用下发操作请求开始到接收到所述磁盘对所述请求的响应为止的时间;N为大于或等于2的整数;
    所述磁盘控制器,还用于根据所述N个实时数据确定所述I/O响应时间是否异常;所述I/O响应时间异常表示所述磁盘不能够正常运行业务;所述I/O响应时间正常表示所述磁盘能够正常运行业务;若所述I/O响应时间异常,则输出检测结果,所述检测结果用于表征所述I/O响应时间异常。
  20. 如权利要求19所述的磁盘系统,其特征在于,所述磁盘控制器用于:确定除所述I/O响应时间之外的其余N-1个I/O相关指标对应的N-1个实时数据 分别处于,所述其余N-1个I/O相关指标中每个I/O相关指标的至少两个预设区间中的哪个预设区间,其中,所述其余N-1个I/O相关指标对应的N-1个实时数据分别处于N-1个所述预设区间内;所述N-1个I/O相关指标中每个I/O相关指标的所述至少两个预设区间为:划分所述每个I/O相关指标能够支持的第一值与第二值之间的大范围得到的至少两个子区间范围;
    判断所述I/O响应时间对应的实时数据是否超过与所述N-1个预设区间的组合对应的I/O响应时间阈值;所述I/O响应时间阈值小于或等于:所述N-1个实时数据分别处于各自对应的所述预设区间时,所述磁盘能够正常运行业务的最大I/O响应时间值;
    若所述I/O响应时间对应的实时数据超过所述I/O响应时间阈值,则确定所述I/O响应时间异常。
  21. 如权利要求19所述的磁盘系统,其特征在于,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离,则确定所述I/O响应时间异常。
  22. 如权利要求19所述的磁盘系统,其特征在于,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间 阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;若所述第一距离大于所述预定距离且所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值,则确定所述I/O响应时间异常。
  23. 如权利要求19所述的磁盘系统,其特征在于,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据是否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  24. 如权利要求19所述的磁盘系统,其特征在于,所述磁盘控制器用于:将所述N个实时数据作为一个N维坐标系中的一个数据点;所述N维坐标系中的N维分别与所述N个I/O相关指标一一对应;
    判断所述数据点距离所述N维坐标系中的一个聚类中心的第一距离是否大于预定距离;所述聚类中心为:对在采集所述N个实时数据之前采集的所述N个I/O相关指标对应的M个N维数据点进行聚类处理得到的中心点;所述预定距离为:使得所述M个N维数据点距离所述聚类中心的M个距离值大于所述预定距离的概率小于用户能够接受的概率范围;
    判断所述I/O响应时间对应的实时数据是否超过预设的I/O响应时间阈值;所述I/O响应时间阈值为:使得所述M个N维数据点中与所述I/O响应时间对应 的M个数据中大于所述I/O响应时间阈值的概率小于用户能够接受的概率范围;
    判断所述N个I/O相关指标中表征负载大小的指标对应的实时数据超否超过预设的负载范围;所述预设的负载范围为所述磁盘能够支持的最小负载与最大负载之间的全部范围或部分范围;
    若所述第一距离大于所述预定距离、所述I/O响应时间对应的实时数据超过所述预设的I/O响应时间阈值且所述表征负载大小的指标对应的实时数据位于所述预设的负载范围内,则确定所述I/O响应时间异常。
  25. 一种电子设备,其特征在于,包括:
    如权利要求19-24任一项所述的磁盘系统;
    处理器,用于读写所述磁盘中的数据。
PCT/CN2016/080376 2015-07-31 2016-04-27 一种检测磁盘的方法及装置 WO2017020614A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP16832103.2A EP3321807B1 (en) 2015-07-31 2016-04-27 Disk detection method and device
US15/883,029 US10768826B2 (en) 2015-07-31 2018-01-29 Disk detection method and apparatus
US17/001,594 US20200387311A1 (en) 2015-07-31 2020-08-24 Disk detection method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510465856.0A CN106407052B (zh) 2015-07-31 2015-07-31 一种检测磁盘的方法及装置
CN201510465856.0 2015-07-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/883,029 Continuation US10768826B2 (en) 2015-07-31 2018-01-29 Disk detection method and apparatus

Publications (1)

Publication Number Publication Date
WO2017020614A1 true WO2017020614A1 (zh) 2017-02-09

Family

ID=57942386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080376 WO2017020614A1 (zh) 2015-07-31 2016-04-27 一种检测磁盘的方法及装置

Country Status (4)

Country Link
US (2) US10768826B2 (zh)
EP (1) EP3321807B1 (zh)
CN (1) CN106407052B (zh)
WO (1) WO2017020614A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259337A (zh) * 2023-05-15 2023-06-13 合肥联宝信息技术有限公司 磁盘异常检测方法及模型训练方法、相关装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844401A (zh) * 2017-11-07 2018-03-27 广州品唯软件有限公司 数据监控方法、装置和计算机存储介质
CN109815037B (zh) * 2017-11-22 2021-07-20 华为技术有限公司 慢盘检测方法和存储阵列
CN110825542B (zh) * 2018-08-07 2023-06-23 深圳爱捷云科技有限公司 一种分布式系统中故障盘的检测方法、装置及检测系统
CN110865896B (zh) * 2018-08-27 2021-03-23 华为技术有限公司 慢盘检测方法及装置、计算机可读存储介质
US10732869B2 (en) * 2018-09-20 2020-08-04 Western Digital Technologies, Inc. Customizing configuration of storage device(s) for operational environment
CN109684140B (zh) * 2018-12-11 2022-07-01 广东浪潮大数据研究有限公司 一种慢盘检测方法、装置、设备及计算机可读存储介质
CN112416639B (zh) * 2020-11-16 2022-08-23 新华三技术有限公司成都分公司 一种慢盘检测方法、装置、设备及存储介质
CN113312218A (zh) * 2021-03-31 2021-08-27 阿里巴巴新加坡控股有限公司 磁盘的检测方法和装置
CN114003477B (zh) * 2021-10-27 2023-08-22 苏州浪潮智能科技有限公司 慢盘诊断信息收集方法、系统、终端及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138610A1 (en) * 2001-03-26 2002-09-26 Yoshihiko Miyazawa Storage area network system, storage, and data transfer amount monitoring apparatus
US20020188697A1 (en) * 2001-06-08 2002-12-12 O'connor Michael A. A method of allocating storage in a storage area network
CN101533366A (zh) * 2009-03-09 2009-09-16 浪潮电子信息产业股份有限公司 一种服务器性能数据采集与分析的方法
CN102147708A (zh) * 2010-02-10 2011-08-10 成都市华为赛门铁克科技有限公司 一种磁盘检测方法及装置
CN103810062A (zh) * 2014-03-05 2014-05-21 华为技术有限公司 慢盘检测方法和装置
US9037826B1 (en) * 2012-02-29 2015-05-19 Amazon Technologies, Inc. System for optimization of input/output from a storage array

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005017735A1 (ja) * 2003-08-19 2005-02-24 Fujitsu Limited ディスクアレイ装置におけるボトルネックを検出するシステムおよびプログラム
US7707148B1 (en) * 2003-10-07 2010-04-27 Natural Selection, Inc. Method and device for clustering categorical data and identifying anomalies, outliers, and exemplars
US20080010531A1 (en) * 2006-06-12 2008-01-10 Mks Instruments, Inc. Classifying faults associated with a manufacturing process
US8467281B1 (en) 2010-09-17 2013-06-18 Emc Corporation Techniques for identifying devices having slow response times
WO2012049760A1 (ja) * 2010-10-14 2012-04-19 富士通株式会社 ストレージ制御装置における基準時間設定方法
US8984125B2 (en) * 2012-08-16 2015-03-17 Fujitsu Limited Computer program, method, and information processing apparatus for analyzing performance of computer system
CN103488544B (zh) * 2013-09-26 2016-08-17 华为技术有限公司 检测慢盘的处理方法和装置
CN103761180A (zh) * 2014-01-11 2014-04-30 浪潮电子信息产业股份有限公司 一种集群存储中磁盘故障的预防及检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138610A1 (en) * 2001-03-26 2002-09-26 Yoshihiko Miyazawa Storage area network system, storage, and data transfer amount monitoring apparatus
US20020188697A1 (en) * 2001-06-08 2002-12-12 O'connor Michael A. A method of allocating storage in a storage area network
CN101533366A (zh) * 2009-03-09 2009-09-16 浪潮电子信息产业股份有限公司 一种服务器性能数据采集与分析的方法
CN102147708A (zh) * 2010-02-10 2011-08-10 成都市华为赛门铁克科技有限公司 一种磁盘检测方法及装置
US9037826B1 (en) * 2012-02-29 2015-05-19 Amazon Technologies, Inc. System for optimization of input/output from a storage array
CN103810062A (zh) * 2014-03-05 2014-05-21 华为技术有限公司 慢盘检测方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3321807A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259337A (zh) * 2023-05-15 2023-06-13 合肥联宝信息技术有限公司 磁盘异常检测方法及模型训练方法、相关装置
CN116259337B (zh) * 2023-05-15 2023-09-05 合肥联宝信息技术有限公司 磁盘异常检测方法及模型训练方法、相关装置

Also Published As

Publication number Publication date
US10768826B2 (en) 2020-09-08
EP3321807A4 (en) 2018-06-20
CN106407052B (zh) 2019-09-13
EP3321807B1 (en) 2019-09-18
US20200387311A1 (en) 2020-12-10
US20180150239A1 (en) 2018-05-31
EP3321807A1 (en) 2018-05-16
CN106407052A (zh) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2017020614A1 (zh) 一种检测磁盘的方法及装置
WO2021184727A1 (zh) 数据异常检测方法、装置、电子设备及存储介质
US10216558B1 (en) Predicting drive failures
US8140915B2 (en) Detecting apparatus, system, program, and detecting method
WO2020093637A1 (zh) 设备状态预测方法、系统、计算机装置及存储介质
US8024613B2 (en) Method and system for managing apparatus performance
WO2017012392A1 (zh) 一种磁盘检测的方法和装置
CN107786368B (zh) 异常节点检测方法以及相关装置
JP5686904B2 (ja) 稼働情報予測計算機、稼働情報予測方法及びプログラム
US11507484B2 (en) Ethod and computer storage node of shared storage system for abnormal behavior detection/analysis
US10191668B1 (en) Method for dynamically modeling medium error evolution to predict disk failure
US9116804B2 (en) Transient detection for predictive health management of data processing systems
WO2021109724A1 (zh) 日志异常检测方法及装置
CN108399115B (zh) 一种运维操作检测方法、装置及电子设备
CN114035990A (zh) 一种面向Linux操作系统时序数据的实时异常检测方法
CN114584377A (zh) 流量异常检测方法、模型的训练方法、装置、设备及介质
US20210349775A1 (en) Method of data management and method of data analysis
TWI777628B (zh) 電腦系統及其專用崩潰轉存硬體裝置與記錄錯誤資料之方法
EP2915059B1 (en) Analyzing data with computer vision
EP4222599A1 (en) Methods and systems for multi-resource outage detection for a system of networked computing devices and root cause identification
US10725665B2 (en) Storage controlling apparatus, recording medium for recording storage control program and storage controlling method
CN111581044A (zh) 集群优化方法、装置、服务器及介质
CN112749035A (zh) 异常检测方法、装置及计算机可读介质
CN117407207B (zh) 一种内存故障处理方法、装置、电子设备及存储介质
WO2021074995A1 (ja) 閾値取得装置、その方法、およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16832103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016832103

Country of ref document: EP