WO2021159687A1 - 数据重构方法、存储设备及存储介质 - Google Patents

数据重构方法、存储设备及存储介质 Download PDF

Info

Publication number
WO2021159687A1
WO2021159687A1 PCT/CN2020/111144 CN2020111144W WO2021159687A1 WO 2021159687 A1 WO2021159687 A1 WO 2021159687A1 CN 2020111144 W CN2020111144 W CN 2020111144W WO 2021159687 A1 WO2021159687 A1 WO 2021159687A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage device
reconstruction
business
reconstruction speed
speed
Prior art date
Application number
PCT/CN2020/111144
Other languages
English (en)
French (fr)
Inventor
鲁鹏
刘金虎
李文思
张瑛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021159687A1 publication Critical patent/WO2021159687A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of storage technology, and in particular to a data reconstruction method, storage device and storage medium.
  • Data reconstruction is one of the effective ways for storage devices to recover data, and it is also one of the key technologies to ensure storage reliability.
  • Data reconstruction refers to the technology of recovering lost data by using the erasure code (EC) algorithm.
  • EC erasure code
  • a storage device usually performs redundant encoding on n data strips (data stip) to generate m parity stips. These (n+m) strips form a stripe.
  • the storage device can store these (n+m) strips scattered on different hard disks.
  • n non-lost stripes both m and n are positive integers.
  • the storage device determines that the hard disk in the storage device is faulty, the storage device reads a preset reconstruction speed, and performs data reconstruction according to the reconstruction speed in a fixed manner.
  • the embodiments of the present application provide a data reconstruction method, storage device, and storage medium, which help to improve resource utilization or avoid business congestion.
  • the technical solution is as follows:
  • a data reconstruction method is provided.
  • a storage device obtains service pressure information, and the service pressure information can indicate the service processing pressure of the storage device;
  • the first reconstruction speed is adjusted to obtain a second reconstruction speed, where the first reconstruction speed is the current data reconstruction speed of the storage device, and the second reconstruction speed is negatively related to the service processing pressure , The smaller the business processing pressure, the greater the second reconstruction speed.
  • the storage device performs data reconstruction on the data stored in the failed disk in the storage device according to the adjusted second reconstruction speed.
  • the above provides a method for dynamically adjusting the reconstruction speed based on the business processing pressure.
  • the current data reconstruction speed of the storage device is adjusted according to the business pressure information of the storage device, and the data reconstruction is performed according to the adjusted reconstruction speed.
  • the business processing pressure of the storage device is small, and the data reconstruction will speed up, so as to make full use of idle resources for data reconstruction, improve the resource utilization of the storage device, and save the time spent on reconstruction , Improve equipment reliability.
  • the data reconstruction will slow down, so as to avoid the storage device reconstruction process from occupying too many resources, thereby reducing the business of the storage device during the data reconstruction process. Deal with the impact of performance and avoid business congestion on storage devices. Therefore, this method helps storage devices to strike a balance between reconstruction speed and service processing performance.
  • the storage device obtains an adjustment step size according to the service pressure information, and the adjustment step size is negatively related to the service processing pressure; the root storage device is based on the adjustment step size And the first reconstruction speed to obtain the second reconstruction speed, where the second reconstruction speed is the sum of the first reconstruction speed and the adjustment step size.
  • the first reconstruction speed determined according to the service pressure information may not be the current optimal data reconstruction speed.
  • the first reconstruction speed is used as the initial value for reconstruction, but also the current value of the performance index fed back by the storage device is used to determine the adjustment step size through feedback adjustment. Making adjustments helps to quickly adjust to the optimal data reconstruction speed under actual business pressure and reduce the ramp-up time of the reconstruction speed.
  • the storage device performs data reconstruction on the data stored in the failed disk at the first point in time according to the second reconstruction speed, and the first point in time is based on the first point in time.
  • the second time point is a time point obtained after the reference is shifted by a preset time length, and the second time point is a time point when the service processing pressure changes.
  • the storage device when the service processing pressure of the storage device is switched, the storage device reduces the impact of the adjustment process of the reconstruction speed on the service and reduces the performance fluctuation of the storage device through advance adjustment or lag adjustment.
  • the second time point is a time point when the service processing pressure drops, and the first time point is later than the second time point.
  • the storage device adjusts the reconstruction speed through hysteresis, which can ensure a smooth transition from high business pressure to low business pressure, and ensure the input and output under the existing load ( Input/Output, IO) request is processed.
  • the second time point is a time point when the service processing pressure rises, and the first time point is earlier than the second time point.
  • the storage device inputs historical business pressure information into a prediction model, where the historical business pressure information is used to indicate the business processing pressure of the storage device at a historical point in time; through the prediction The model processes the historical business pressure information and outputs the business pressure information.
  • the business pressure information reflects the law that the business processing pressure of the storage device changes over time.
  • the storage device can collect historical business pressure information from the historical business pressure information by collecting statistics on historical business pressure information during historical operations and using predictive models. This law is dig out to predict future business pressure information.
  • the business pressure information predicted by this method is more accurate. Therefore, when the reconstruction speed is determined by the business pressure information, it helps to improve the accuracy of the reconstruction speed.
  • the historical service pressure information includes at least one of the following: a CPU utilization rate of the central processing unit of the storage device at a historical time point; or, the number of reads and writes per second of the storage device at a historical time point (input output per second, IOPS); or, the disk bandwidth of the storage device at a historical point in time; or, the network interconnection protocol of the storage device at a historical point in time.
  • Internet Protocol English: internet protocol, abbreviated as: IP
  • garbage collection English: Garbage Collection, GC for short
  • the service processing pressure of the storage device is characterized by the resource of the storage device.
  • the resource usage of the storage device is large, and then the resource usage is taken as
  • the business pressure information can accurately describe the size of the business processing pressure, which helps to adapt to the dynamic adjustment of the reconstruction speed in the case of complex customer scenarios and system resource changes, and improve the adaptability of scenarios and equipment.
  • the storage device first determines that the performance index of the storage device at the first reconstruction speed satisfies a preset condition, and when the preset condition is met, the storage device performs a check on the first reconfiguration according to the service pressure information. The structure speed is adjusted.
  • the storage device can determine whether the reconstruction speed needs to be adjusted currently according to the degree of influence of the current reconstruction speed on the performance of the device, thereby improving the flexibility of adjusting the reconstruction speed.
  • the preset condition includes: the difference between the current value of the performance indicator and the expected value of the performance indicator is greater than a threshold.
  • the current value of the performance index can reflect the performance of the storage device at the current reconstruction speed
  • the expected value of the performance index can reflect the impact of data reconstruction on the performance of the storage device.
  • a storage device in a second aspect, includes a first processor, a second processor, and one or more hard disks; the first processor is configured to obtain business pressure information; and according to the business pressure information , Adjusting the first reconstruction speed to obtain the second reconstruction speed; the second processor is configured to perform data reconstruction on the data stored in the failed disk in the one or more hard disks.
  • a storage device in a third aspect, includes a processor, and the processor is configured to execute the data reconstruction method provided in the foregoing first aspect or any one of the optional manners of the first aspect.
  • the storage device includes a processor, and the processor is configured to execute the data reconstruction method provided in the foregoing first aspect or any one of the optional manners of the first aspect.
  • a computer-readable storage medium stores at least one instruction, and the instruction is read by a processor to make the storage device execute the first aspect or any one of the optional manners of the first aspect
  • the data reconstruction method provided.
  • a chip is provided, when the chip runs on a storage device, the storage device executes the data reconstruction method provided in the first aspect or any one of the optional methods of the first aspect.
  • a computer program product is provided.
  • the storage device executes the data reconstruction method provided in the first aspect or any one of the optional methods in the first aspect.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a data reconstruction method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a relationship between a change in service processing pressure and a change in reconstruction speed according to an embodiment of the present application
  • Fig. 5 is a software architecture diagram of a data reconstruction method provided by an embodiment of the present application.
  • Fig. 6 is a flowchart of a reconstruction speed adjustment method provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a data reconstruction method provided by an embodiment of the present application.
  • FIG. 8 is a flowchart of a data reconstruction method provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data reconstruction device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a storage device provided by an embodiment of the present application.
  • first, second and other words are used to distinguish the same items or similar items that have basically the same function and function. It should be understood that between “first”, “second” and “nth” There are no logic or timing dependencies, and no restrictions on the number and execution order. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another.
  • the first reconstruction speed may be referred to as the second reconstruction speed
  • the second reconstruction speed may be referred to as the first reconstruction speed. Both the first reconstruction speed and the second reconstruction speed may be reconstruction speeds, and in some cases, may be separate and different reconstruction speeds.
  • the size of the sequence number of each process does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not correspond to the difference in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • determining B according to A does not mean that B is determined only according to A, and B can also be determined according to A and/or other information.
  • the data reconstruction method provided in the embodiments of the present application can be applied to a scenario where a storage device performs data reconstruction.
  • the data reconstruction method of the embodiment of the present application can be applied to a scenario where a centralized storage device or a distributed storage system performs data reconstruction. The following briefly introduces the scenario of data reconstruction.
  • Data reconstruction is one of the effective ways for storage devices to restore data and one of the key technologies to ensure storage reliability.
  • the capacity of a single disk continues to increase, resulting in a long disk reconstruction time, especially for large-capacity disks, the reconstruction time is too long, resulting in low data reliability.
  • a strategy for dynamically adjusting the reconstruction speed based on business pressure is provided.
  • the establishment of a portrayal model of reconstruction resource consumption effectively establishes the relationship between reconstruction speed and resource consumption.
  • the initial reconstruction speed is obtained through the predicted resource margin, and then the dynamic step adjustment feedback system based on business pressure is used to quickly adjust the initial reconstruction speed To the optimal reconstruction speed. Therefore, by predicting business pressure and reconstructing the resource consumption characterization model, it can ensure that other services are not affected, and the remaining resources of the equipment can be used as much as possible to increase the speed of reconstruction, while also effectively ensuring business continuity. .
  • the system architecture 100 is an example of a centralized storage device.
  • the centralized storage device is, for example, a storage array.
  • the storage array includes one or more controllers, and each controller includes one or more hard disks. When the hard disk in the storage array fails, the controller can perform data reconstruction on the data stored in the failed disk.
  • the controller of the storage array is also called the storage controller, and the controller is commonly called the head.
  • the controller of the storage array is, for example, the controller 101 in FIG. 1, and the hard disks in the storage array are the hard disk 102, the hard disk 103, the hard disk 104 and the hard disk 105 in FIG. 1. Among them, the ellipsis in FIG.
  • the hard disk is, for example, a solid state drive (English full name: solid state drive, English abbreviation: SSD), mechanical hard drive (English full name: hard disk drive, English abbreviation: HDD), etc.
  • the hard disk in the system architecture 100 is a smart hard disk, and the smart hard disk has its own processor, memory and other resources required for calculation processing.
  • the system architecture 200 is an example of a distributed storage system.
  • the distributed storage system includes multiple storage nodes.
  • the storage node is, for example, a server, and the server includes one or more hard disks.
  • the storage node is the server 201, the server 202, the server 203, or the server 204 in FIG.
  • the distributed storage system may optionally also include computing nodes.
  • the computing nodes include but are not limited to storage clients, Meta Data Controller (MDC) nodes, and cloud servers (Elastic Compute Service, ECS). Service nodes, cloud hard disk backup (Volume Backup Service, VBS for short) nodes, etc.
  • Computing nodes such as hosts, servers, personal computers, or other devices with computing processing capabilities.
  • the computing node is the server 205 or the server 206 in the system architecture 200.
  • FIG. 3 is a flowchart of a data reconstruction method provided by an embodiment of the present application, and the method is applied to a storage device.
  • the storage device is a controller in the storage array, and the controller adjusts the reconstruction speed based on the service processing pressure of the storage array by executing the first embodiment in the scenario of reconstructing the data of the failed disk in the storage array.
  • the storage device is the controller 101 in the system architecture 100.
  • the controller 101 executes the first embodiment based on the system The business processing pressure of the framework 100 adjusts the reconstruction speed.
  • the storage device is a storage node in a distributed storage system.
  • the storage node reconstructs failed disk data in the distributed storage system, by executing Embodiment 1, it is based on the business processing pressure of the distributed storage system. Adjust the reconstruction speed.
  • the storage device is the server 201 in the system architecture 200.
  • the server 201 executes the first embodiment based on the business processing of the system architecture 200. Pressure adjusts the reconstruction speed.
  • the first embodiment includes the following S301 to S304.
  • the storage device obtains service pressure information.
  • the business pressure information is used to indicate the business processing pressure of the storage device. For example, when the storage device receives an access request, it needs to perform a data reading service, and when the storage device receives a write request, it needs to perform a data storage service. In the process of processing business, storage devices will face business processing pressure.
  • the business pressure information can indicate the size of the business processing pressure of the storage device, so as to adjust the reconstruction speed in combination with the business processing pressure of the storage device.
  • the data form of business stress information can be, but is not limited to, numerical, vector, matrix or other forms.
  • the service pressure information includes resource occupancy information of the storage device, and the resource occupancy information is, for example, a resource utilization rate, a resource usage amount, or a resource remaining amount.
  • the resource includes but is not limited to at least one of computing resources, storage resources, and network resources.
  • Computing resources are, for example, the processors of storage devices, such as general-purpose central processing units (English: central processing unit, abbreviated as: CPU), graphics processing units (English: Graphics Processing Unit, abbreviated as: GPU), etc.
  • storage resources are, for example, storage
  • the hard disk of the device and the network resources are, for example, the network card and bandwidth of the storage device.
  • the business stress information includes but is not limited to any of the following (1) to (12).
  • IOPS input output per second
  • the disk bandwidth of the storage device where the disk bandwidth includes at least one of a read bandwidth and a write bandwidth.
  • IP Internet Protocol
  • the bandwidth of the disk array card of the storage device where the storage device includes one or more disk array cards.
  • disk array cards include but are not limited to serial SCSI (Serial Attached SCSI, SAS) array cards, Small Computer System Interface (SCSI) array cards, and serial At least one of Serial Advanced Technology Attachment (SATA) array card and Integrated Drive Electronics (IDE) array card.
  • serial SCSI Serial Attached SCSI
  • SAS Serial Attached SCSI
  • SCSI Small Computer System Interface
  • SATA Serial Advanced Technology Attachment
  • IDE Integrated Drive Electronics
  • the read/write ratio includes at least one of the ratio of read requests in IO requests or the ratio of write requests in IO requests.
  • the service processing pressure of the storage device is characterized by the resource of the storage device.
  • the resource usage of the storage device is large, and then the resource usage is taken as
  • the business pressure information can accurately describe the size of the business processing pressure, which helps to adapt to the dynamic adjustment of the reconstruction speed in the case of complex customer scenarios and system resource changes, and improve the adaptability of scenarios and equipment.
  • the business pressure information is one of (1) to (12) above, for example, the business pressure information is IOPS, or the business pressure information is CPU utilization.
  • the business pressure information is a combination of two or more of the above (1) to (12).
  • the business pressure information includes multiple items of (1) to (12)
  • the business pressure information One dimension of can be one of (1) to (12) above.
  • the above-mentioned combination of (1) to (12) includes, but is not limited to, the feature splicing method.
  • Feature splicing is a term in the field of machine learning. Feature splicing refers to combining features of multiple dimensions in horizontal splicing or vertical splicing to obtain a data that includes the characteristics of each dimension. In layman's terms, feature splicing can be Think of it as the process of splicing multiple vectors into a large matrix.
  • one column in the business pressure information can be any one of the above (1) to (12), or a row in the business pressure information can be any one of the above (1) to (12) One item.
  • business pressure information includes CPU utilization, IOPS, and disk bandwidth. CPU utilization, IOPS, and disk bandwidth are the three dimensions of business pressure information. CPU utilization is the first column of business pressure information, and IOPS is business pressure information The second column of the disk bandwidth is the third column of business pressure information.
  • feature splicing method is only an optional method for combining data of different dimensions in the business pressure information, rather than a mandatory method.
  • many of the above (1) to (12) are combined by feature , Introduce into the business stress information.
  • feature fusion is a term in the field of machine learning. Feature fusion refers to multiplying or adding features from multiple dimensions to a value, which combines the features of each dimension.
  • the above (1) to (12) are data schematically included in the business stress information.
  • the business stress information includes data other than the above (1) to (12), or the above (1) to (12) 1) to (12) can be ignored or not used.
  • one or more of the above (1) to (12) are replaced with other data, which includes but is not limited to free storage space, the proportion of hot data, and so on.
  • the storage device predicts service pressure information based on the historical law of the service.
  • the following uses S3011 to S3013 to illustrate this implementation manner.
  • the storage device obtains historical service pressure information.
  • the historical business pressure information is used to indicate the business processing pressure of the storage device at a historical point in time.
  • the granularity of historical time points includes but is not limited to seconds, minutes, hours, and so on. Taking the granularity of seconds as an example, the historical business pressure information indicates the business processing pressure of the storage device in the past second.
  • the historical business stress information includes but is not limited to any one of the following (1) to (12).
  • the disk bandwidth of the storage device at a historical time point where the disk bandwidth includes at least one of a read bandwidth and a write bandwidth.
  • the read and write ratio of the IO request received by the storage device in the unit time period at the historical time point includes at least one of the ratio of read requests in IO requests or the ratio of write requests in IO requests.
  • the storage device counts historical business pressure information during historical operations, and saves the historical business pressure information to the hard disk.
  • the storage device reads historical business pressure information from the hard disk.
  • the method of collecting historical service pressure information is periodic statistics. Specifically, the storage device collects historical service pressure information every other statistical period. Wherein, the time unit of the statistical period is, for example, minutes, hours, and so on.
  • the storage device can also collect historical business pressure information in real time.
  • the storage device inputs historical business pressure information into the prediction model.
  • the prediction model is, for example, a function.
  • the input parameters of the prediction model include historical business pressure information
  • the output parameters of the prediction model include business pressure information at a future point in time.
  • the future time point refers to a time point later than the historical time point, and there is a certain size of time interval between the future time point and the historical time point.
  • the minimum value of the time interval between the future time point and the historical time point is, for example, 1 minute.
  • the methods for predicting business stress information include many situations.
  • the following examples illustrate situations 1 to situation 2.
  • Situation 1 Forecast business pressure information based on all historical business pressure information collected over a period of time.
  • the future time point and the historical time point belong to the same time period.
  • the historical time point includes 8 o'clock in the evening on January 31.
  • the business pressure information at 8 o'clock in the evening on February 1 is predicted.
  • the historical business pressure information input to the prediction model is in the form of time series
  • the business pressure information output by the prediction model is also in the form of time series.
  • historical business pressure information includes N1 data, N1 data corresponds to N1 historical time points, each of the N1 data indicates the business processing pressure of the storage device at a historical time point, and N1 data is in the historical business pressure information In the order of corresponding historical time points.
  • the business pressure information output by the predictive model includes N2 data, N2 data corresponding to N2 future time points, each of the N2 data indicates the business processing pressure of the storage device at a future time point, and N2 data in the business pressure information In the order of future time points.
  • N1 and N2 are both positive integers.
  • N1 is greater than 1
  • N2 is greater than or equal to 1.
  • the types of prediction models include a variety of situations, which are illustrated in the following through situations (a) and (b).
  • the forecast model is a time series forecast model.
  • the prediction model is a differential integrated moving average autoregressive model (Autoregressive Integrated Moving Average model, ARIMA), exponential smoothing model, period identification model, and so on.
  • the prediction model is a machine learning model, for example, the prediction model is a linear fitting model, a logistic regression model, and a deep learning model.
  • the predictive model is, for example, a convolutional neural network, a long short-term memory (Long Short-Term Memory, LSTM) network, and so on.
  • the historical business pressure information is used as a sample, and the business pressure information at a future time point is used as the target value for training to obtain the deep learning model.
  • the storage device processes the historical business pressure information through the predictive model, and outputs the business pressure information.
  • the process of processing through the predictive model can be different.
  • the forecasting model is an ARIMA model
  • the process of the ARIMA model processing historical business pressure information includes the process of autoregressive calculation and the process of moving average.
  • the predictive model is a deep learning model
  • the process of the deep learning model to process historical business pressure information includes a process of feature extraction and a process of classification based on features.
  • the business pressure information reflects the law of changes in the business processing pressure of the storage device over time.
  • the storage device collects historical business pressure information in the historical operation and uses the predictive model to learn from the history. This law is excavated from the business pressure information to predict the future business pressure information.
  • the business pressure information predicted by this method is more accurate. Therefore, when the reconstruction speed is determined by the business pressure information, it will help to improve the reconstruction speed. Accuracy.
  • the storage device determines that the performance index of the storage device at the first reconstruction speed meets a preset condition.
  • the reconstruction speed of the storage device is no longer fixed, but can be dynamically adjusted. After the reconstruction speed is adjusted, the value of the reconstruction speed will change. In order to distinguish and describe different reconstruction speeds, the reconstruction speed before adjustment is called the first reconstruction speed, and the reconstruction speed after the adjustment is called the second reconstruction speed.
  • the first reconstruction speed is the current data reconstruction speed of the storage device.
  • the first reconstruction speed is an initial value of the reconstruction speed during the data reconstruction process.
  • the first reconstruction speed is determined according to service pressure information.
  • the first reconstruction speed is a recommended reconstruction speed value obtained according to the service pressure information.
  • the recommended reconstruction speed at the future time point can be provided, and when the time reaches the future time point, the recommended reconstruction speed is taken as the initial value Perform data reconstruction.
  • the recommended reconstruction speed described here is the first reconstruction speed.
  • the storage device inputs historical service pressure information into the reconstruction speed determination model, processes the historical service pressure information through the reconstruction speed determination model, and outputs the first reconstruction speed.
  • the reconstruction speed determination model is a machine learning model.
  • the reconstruction speed determination model is a linear fitting model, a logistic regression model, a support vector regression (SVR) model, or a deep neural network (Deep Neural). Network, DNN) model, convolutional neural network model, etc.
  • R represents business pressure information
  • the data format of R is, for example, a matrix or a number.
  • y 0 represents the first reconstruction speed (for example, the initial value of the reconstruction speed), and f represents the mapping relationship, that is, the function.
  • the initial value of the reconstruction speed includes different levels of determination, that is, multiplying the level coefficient in front of y 0 or other methods.
  • the business pressure information is in the form of a time series
  • the business pressure information indicates the business processing pressure at multiple future time points
  • the storage device determines each of the multiple future time points based on the business pressure information
  • the corresponding first reconstruction speed saves the corresponding relationship between the future time point and the first reconstruction speed.
  • the storage device starts a timer, and when the time reaches a future time point, according to the pre-stored correspondence relationship, the first reconstruction speed corresponding to the future time point is used as the initial value to start data reconstruction.
  • the business pressure information indicates that the business processing pressure of the storage device will reach pressure 1 at 8 o'clock in the evening, and the business pressure information of the storage device will reach pressure 2 at 9 o'clock in the evening.
  • the storage device recommends reconstruction speed A according to pressure 1, and reconstruction speed B according to pressure 2. Then, when the time reaches 8 pm, the storage device starts data reconstruction with reconstruction speed A as the initial value. When the first reconstruction speed is the reconstruction speed A. Similarly, when the time reaches 9 o'clock in the evening, the storage device starts data reconstruction with the reconstruction speed B as the initial value. At this time, the first reconstruction speed is the reconstruction speed B. In this way, the storage device starts the reconstruction at each moment with the reconstruction speed corresponding to the business pressure information as the initial value, so that the reconstruction speed at each moment matches the business processing pressure at the corresponding moment, thereby flexibly adjusting the data Reconstruction speed.
  • the storage device determines whether to adjust the reconstruction speed according to the degree of influence of the current reconstruction speed on the performance of the device.
  • the performance of the device is characterized by the value of the performance index of the device.
  • the storage device determines whether the performance index at the first reconstruction speed meets the preset conditions, and if the storage device is at the first reconstruction speed, If the performance index at the reconstruction speed meets the preset condition, execute S303 to adjust the reconstruction speed. If the performance index of the storage device at the first reconstruction speed does not meet the preset condition, keep the current reconstruction speed unchanged .
  • Performance indicators are used to indicate the performance of storage devices.
  • the performance indicator includes the latency of an IO request, and the latency of an IO request is, for example, the time it takes for the storage device to complete the reading and writing of data from receiving the IO request.
  • the value of the performance index is an average value in a unit time period.
  • the storage device collects the total number of processed IO requests in a unit time period and the total time length of processing these IO requests, calculates the average of the total time length and the total number, and obtains the delay of the IO request in the unit time period.
  • the preset conditions include: the gap between the current value of the performance indicator and the expected value of the performance indicator is greater than the threshold.
  • the expected value of the performance index is preset by the user. For example, create a logical unit in the storage device (LUN, LUN refers to the logical disk virtualized by the storage hardware, the operating system of the storage device usually treats the LUN as a usable hard disk), and it is expected that even if the storage device performs data In the case of reconstruction, the delay of IO delivery under LUN does not exceed M, then this M is the expected value of the performance index, where M is a positive number.
  • the process of judging whether the performance index meets the preset condition includes: the current value of the statistical performance index of the storage device, the storage device reads the expected value of the preset performance index, and the storage device calculates the current value of the performance index. The difference between the value and the expected value of the performance index is compared with the threshold.
  • the storage device executes S303 to adjust the reconstruction speed .
  • the storage device keeps the current reconstruction speed unchanged.
  • the steps performed by the storage device include, but are not limited to, any of the following methods I to II.
  • Method I The storage device executes S303 to adjust the reconstruction speed.
  • Method II The storage device does not adjust the reconstruction speed, but keeps the current reconstruction speed unchanged.
  • the threshold used by the storage device is a preset value.
  • the threshold is used to determine the difference between the current value of the performance index and the expected value of the performance index.
  • the storage device passes Configure the adjustment step length to 0 to keep the current reconstruction speed unchanged.
  • the storage device calculates the adjustment step length and adjusts the reconstruction speed according to the adjustment step length.
  • e represents the threshold and e is a constant.
  • the storage device adjusts the reconstruction speed according to the adjustment step size, and in another possible implementation, the storage device adjusts the step size configuration Is zero, so as to keep the current reconstruction speed unchanged, that is, this embodiment does not limit whether the storage device adjusts the reconstruction speed when
  • e.
  • the effect achieved at least includes: during the data reconstruction process of the storage device, the current value of the performance index can reflect the performance of the storage device at the current reconstruction speed.
  • the expected value of the performance index can reflect the maximum impact of allowing data reconstruction on the performance of the storage device.
  • the difference between the current value of the performance index and the expected value of the performance index is greater than the threshold, it indicates that the current reconstruction speed has already affected the storage device’s performance.
  • performance indicators can still meet expectations. And if the difference between the current value of the performance index and the expected value of the performance index is less than the threshold, by maintaining the current reconstruction speed for data reconstruction, the resources of the storage device can be fully utilized for reconstruction, thereby improving resource utilization.
  • S302 is an optional step.
  • the storage device executes S301, the storage device does not execute S302, but skips S302 and directly executes S303.
  • the storage device adjusts the first reconstruction speed according to the service pressure information to obtain the second reconstruction speed.
  • the second reconstruction speed is negatively related to business processing pressure.
  • the negative correlation is, for example, that the second reconstruction speed is inversely proportional to the business processing pressure, and the meaning of the negative correlation includes but is not limited to the following two aspects.
  • the smaller the business processing pressure the greater the second reconstruction speed. That is, if the business processing pressure of the storage device is reduced, the data reconstruction speed will increase, and the data reconstruction process of the storage device will speed up. In this way, when the storage device is in a service idle state, the service processing pressure of the storage device is small. At this time, by increasing the reconstruction speed of the storage device, the idle resources of the storage device can be fully utilized and the resource utilization rate of the storage device can be improved. In addition, since the reconstruction speed can be improved, the reliability of the storage device can be improved, and the cost performance of the storage device can be improved.
  • the greater the business processing pressure the lower the second reconstruction speed. That is, if the business processing pressure of the storage device increases, the data reconstruction speed will decrease, and the data reconstruction process of the storage device will slow down. In this way, if the storage device is in a busy state, the business processing pressure of the storage device is heavy. At this time, by reducing the reconstruction speed of the storage device, the impact of the data reconstruction process on the performance of the storage device is reduced, so as to ensure that the data reconstruction is effective.
  • the impact of the business is in a controllable range, avoiding business congestion and equipment downtime caused by data reconstruction.
  • the capacity of the storage device may be improved along with the upgrade process, causing changes in the business processing pressure of the storage device.
  • obtain the service pressure information after the storage device is upgraded the service pressure information is used to indicate the service processing pressure after the storage device is upgraded, and the first reconstruction is performed according to the service pressure information after the storage device upgrade.
  • the speed is adjusted to obtain the second reconstruction speed.
  • the current capability of the storage device can be dynamically sensed, and the reconstruction speed can be adjusted in combination with the current capability of the storage device, so that the reconstruction speed is adapted to the upgraded storage device.
  • the business processing pressure will increase after the storage device is upgraded.
  • the data reconstruction When this method is executed, the data reconstruction will automatically slow down, thereby avoiding business congestion after the storage device is upgraded. When the business processing pressure is reduced after the storage device is upgraded, by executing this method, the data reconstruction will automatically speed up, thereby increasing storage Resource utilization after equipment upgrade.
  • How to adjust the reconstruction speed according to business pressure information includes multiple implementation methods.
  • the reconstruction speed is adjusted through feedback adjustment.
  • feedback adjustment refers to the work result of a system, which in turn is used as an input parameter to adjust the system.
  • the storage device can perform data reconstruction at the first reconstruction speed according to the current value of the performance index and Business pressure information to adjust the first reconstruction speed. Through this feedback adjustment method, it helps to dynamically adjust to the optimal reconstruction speed.
  • the manner of feedback adjustment includes the following S3031 to S3032.
  • the storage device determines the adjustment step.
  • the adjustment step size refers to the step size of the reconstruction speed adjustment process, that is, the amplitude of the reconstruction speed change after the reconstruction speed is adjusted once, that is, the increment of the reconstruction speed.
  • the adjustment step includes but is not limited to the following cases 1 to 2.
  • the storage device performs the step of adjusting the reconstruction speed multiple times, thereby gradually approaching the second reconstruction speed from the first reconstruction speed.
  • the adjustment step used may be different.
  • the first adjustment step is used for adjustment
  • the second adjustment step is used for adjustment
  • the first adjustment step is greater than the second adjustment step.
  • the climbing time refers to the time required for adjustment according to the adjustment step.
  • How to determine the adjustment step includes multiple implementation manners, which are illustrated below through implementation manner one to implementation manner two.
  • Implementation method 1 Obtain the current value of the performance index, and obtain the adjustment step size according to the gap between the current value of the performance index and the expected value of the performance index.
  • the adjustment step is positively correlated with the gap between the current value of the performance index and the expected value of the performance index. That is, the larger the gap between the current value of the performance index and the expected value of the performance index, the larger the adjustment step size.
  • the adjustment step size is, for example, the following formula (2):
  • ⁇ y represents the adjustment step size
  • C is a constant
  • t 0 represents the expected value of the performance index
  • t 1 represents the current value of the performance index at the current reconstruction speed.
  • means to take the absolute value.
  • the rate of change of the reconstruction speed is dynamically adjusted according to the performance index of the storage device.
  • Implementation method 2 Obtain the adjustment step size according to the business pressure information, and the adjustment step size is negatively correlated with the business processing pressure.
  • the adjustment step size is inversely proportional to the business processing pressure. The greater the business processing pressure, the smaller the adjustment step size, and the smaller the business processing pressure, the larger the adjustment step size.
  • the second implementation method also introduces performance indicators to participate in the calculation, for example, obtains the current value of the performance indicator and business pressure information, and obtains the adjustment step according to the business pressure information, the current value of the performance indicator and the expected value of the performance indicator. long.
  • the following formula (3) is used for calculation to obtain the adjustment step length.
  • ⁇ y represents the adjustment step size
  • K represents a constant.
  • R represents business pressure information.
  • t 0 represents the expected value of the performance index, and t 1 represents the current value of the performance index at the current reconstruction speed.
  • means to take the absolute value.
  • the adjustment step size is preset, and each time the reconstruction speed is adjusted, the preset adjustment step size is used to adjust the reconstruction speed.
  • the storage device obtains a second reconstruction speed according to the adjustment step size and the first reconstruction speed.
  • the second reconstruction speed is the sum of the first reconstruction speed and the adjustment step length.
  • the second reconstruction speed is expressed by the following formula (4).
  • y represents the second reconstruction speed, that is, the adjusted reconstruction speed.
  • y 0 represents the first reconstruction speed, for example, the initial value of the reconstruction speed.
  • sgn is a symbolic function.
  • ⁇ y represents the adjustment step length.
  • t 0 represents the expected value of the performance index, and t 1 represents the current value of the performance index at the current reconstruction speed. * Means multiply.
  • the first reconstruction speed determined according to the service pressure information may not be the current optimal data reconstruction speed.
  • the first reconstruction speed is used as the initial value for reconstruction, but also the current value of the performance index fed back by the storage device is used to determine the adjustment step size through feedback adjustment. Making adjustments helps to quickly adjust to the optimal data reconstruction speed under actual business pressure and reduce the ramp-up time of the reconstruction speed.
  • the storage device performs data reconstruction on the data stored in the failed disk in the storage device according to the second reconstruction speed.
  • the storage device when the service processing pressure of the storage device is switched, the storage device reduces the impact of the adjustment process of the reconstruction speed on the service and reduces the performance fluctuation of the storage device through advance adjustment or lag adjustment.
  • the time point at which the storage device performs data reconstruction according to the adjusted reconstruction speed is referred to as the first time point, and the time point at which the service processing pressure of the storage device changes is referred to as the second time point as an example for description. If it is detected that the service processing pressure of the storage device has changed at the second time point, the preset time period is offset based on the second time point to obtain the first time point. At the first time point, the faulty disk is processed at the second reconstruction speed The stored data undergoes data reconstruction.
  • whether the storage device adopts the pre-adjustment strategy or the post-adjustment strategy is determined according to the switching mode of the business pressure.
  • the service pressure switching methods include high-low switching and low-high switching, which are described in detail below through case A to case B.
  • Case A If the business processing pressure of the storage device drops, that is, when the business processing pressure is switched between high and low, the storage device lags in adjusting the reconstruction speed. Specifically, if the storage device detects that the service processing pressure drops at the second time point, the storage device uses the second time point as a reference and shifts the second time point backward by a preset period of time to obtain the data that is later than the second time point. At the first point in time, at the first point in time, the storage device performs data reconstruction on the data stored in the failed disk at the second reconstruction speed. For example, referring to Figure 4, the first time point is for example t_a, the second time point is for example t_b, and the preset duration is for example ⁇ T.
  • the business processing of the storage device The pressure drops.
  • the reconstruction speed of the storage device increases at time t_b.
  • the process of adjusting the reconstruction speed is realized by sending a reconstruction speed command, and the storage device sends a reconstruction speed command carrying the second reconstruction speed at a time point after the time point at which the pressure drops, so as to achieve hysteresis. The purpose of adjustment.
  • the storage device adjusts the reconstruction speed through hysteresis, which can ensure a smooth transition from high service pressure to low service pressure, and ensure that IO requests under the existing load are processed.
  • Case B If the business processing pressure of the storage device rises, that is, when the business processing pressure is switched between low and high, the storage device adjusts the reconstruction speed in advance. Specifically, if the storage device detects that the service processing pressure has increased at the second time point, the storage device uses the second time point as a reference and shifts the second time point forward by a preset period of time, and obtains the data that is earlier than the second time point. At the first point in time, at the first point in time, the storage device performs data reconstruction on the data stored in the failed disk at the second reconstruction speed.
  • the first time point is for example t_c
  • the second time point is for example t_d
  • the preset duration is for example ⁇ T.
  • the business pressure change curve that when the time is t_c, the business processing of the storage device
  • the pressure rises, in this case, is advanced by ⁇ T, and at time t_d, the reconstruction speed of the storage device has already begun to decrease.
  • the process of adjusting the reconstruction speed is realized by sending a reconstruction speed command, and the storage device can realize advance adjustment by sending a reconstruction speed command carrying the second reconstruction speed at a time point after the pressure rise time point. the goal of.
  • the range of hysteresis adjustment is the preset duration
  • the range of advance adjustment is the preset duration only for illustration.
  • the range of hysteresis adjustment or the range of advance adjustment is not the preset duration, but Determined according to business processing pressure. For example, the greater the increase in business processing pressure, the greater the range of advance adjustment, and the greater the decline in business processing pressure, the greater the range of lag adjustment.
  • the foregoing embodiment provides a dynamic adjustment strategy for reconfiguration speed based on business pressure.
  • the dynamic adjustment strategy is applied in other task scenarios related to business pressure, such as applied in a GC task or a data replication task.
  • the current GC speed is adjusted to obtain the target GC speed.
  • the target GC speed is negatively related to the business processing pressure.
  • the GC speed is increased.
  • the processing pressure is high, the GC speed decelerates, so as to realize the dynamic adjustment of the GC speed based on the business pressure, and reduce the impact of the execution of the GC task on the business pressure.
  • the current data copy speed is adjusted to obtain the target data copy speed.
  • the target data copy speed is negatively related to the business processing pressure. Then, when the business processing pressure of the storage device is small, the data copy task speeds up. When the business processing pressure of the storage device is heavy, the data replication task is slowed down, so that the GC speed can be dynamically adjusted based on the business pressure, and the impact of the execution of the data replication task on the business pressure is reduced.
  • the above-mentioned method embodiments can be implemented through the collaborative work of different modules of the storage device.
  • This embodiment provides an implementation of a data reconstruction system.
  • the logical function architecture of the data reconstruction system is shown in Figure 5.
  • the data reconstruction system includes a plurality of software function modules, such as stress prediction.
  • the system scheduling module 503 may also be referred to as a quality of service (Quality of Service, QoS) module.
  • QoS Quality of Service
  • the stress prediction module 501 and the resource characterization module 502 are located in other processors outside the controller, for example, the stress prediction module 501 and the resource characterization module 502 are located in the GPU.
  • the system scheduling module 503, the performance evaluation module 504, the step length calculation module 505, the reconstruction control module 506, and the reconstruction calculation module 507 are located in the controller.
  • the pressure prediction module 501, the resource characterization module 502, the system scheduling module 503, the performance evaluation module 504, the step length calculation module 505, the reconstruction control module 506, and the reconstruction calculation module 507 are all in the controller.
  • These functional modules include S311 to S318 in the reconstruction of the overall calculation process. S311 to S318 are examples of the foregoing S301 to S304.
  • the controller collects historical service pressure information and saves it to the hard disk.
  • the historical service pressure information is, for example, equipment resource data: CPU utilization rate and so on.
  • the controller determines that the hard disk is damaged.
  • the controller reads the historical business pressure information from the hard disk storing the historical business pressure information, and sends the historical business pressure information to the pressure prediction module 501.
  • the pressure prediction module 501 predicts the future business pressure, obtains business pressure information, and sends the business pressure information to the resource characterization model module.
  • the resource characterization module 502 calculates the recommended value of the reconstruction speed according to the business pressure information, that is, the first reconstruction Speed, the first reconstruction speed at different times is sent to the system scheduling module 503 (QoS).
  • QoS system scheduling module 503
  • calculations related to pressure prediction and resource model characterization can be offloaded, such as with GPU or other external devices.
  • the system scheduling module 503 sets the data reconstruction speed to the first reconstruction speed, and sends the first reconstruction speed to the reconstruction control module 506 to notify the reconstruction control module 506 to perform data reconstruction at the first reconstruction speed. .
  • the reconstruction control module 506 reads the reconstruction dependent data (the reconstruction dependent data is the input data of the data reconstruction process), and the reconstruction control module 506 puts the read data into the reconstruction calculation module 507.
  • the structure control module 506 saves data and modifies metadata.
  • the controller collects the performance index of the device, and judges the difference between the current reconstruction speed and the optimal reconstruction speed according to the current value of the performance index.
  • the step length calculation module 505 calculates the adjustment step length, and sends the adjustment step length to the system scheduling module 503, and the system scheduling module 503 updates the reconstruction speed.
  • S316 to S318 can be repeated until the reconstruction is completed.
  • FIG. 6 shows a flow chart of adjusting dynamic feedback.
  • the adjusting process of dynamic feedback includes S321 to S325.
  • S323 Judge the impact of reconstruction with the current value on the system. Specifically, calculate
  • e, execute S324 or execute S325.
  • S324 The increment of the reconstruction speed is sgn(0), that is, the increment of the reconstruction speed is zero, that is, the reconstruction speed is kept unchanged, and return to S322.
  • This embodiment provides a method for dynamically adjusting the reconstruction speed based on the service processing pressure.
  • the current data reconstruction speed of the storage device is adjusted according to the service pressure information of the storage device, and the data is performed according to the adjusted reconstruction speed.
  • Reconstruction When the storage device is in a business idle state, and the business processing pressure of the storage device is small, the data reconstruction will speed up, so as to make full use of idle resources for data reconstruction, improve the resource utilization of the storage device, and save the cost of reconstruction Time to improve equipment reliability.
  • the storage device is in a busy state and the business processing pressure of the storage device is heavy, the data reconstruction will slow down, so as to avoid the storage device reconstruction process from occupying too many resources, thereby reducing the business of the storage device during the data reconstruction process. Deal with the impact of performance and avoid business congestion on storage devices. Therefore, this method helps storage devices to strike a balance between reconstruction speed and service processing performance.
  • the first embodiment introduces a method for dynamically adjusting the reconstruction speed based on business processing pressure.
  • the execution subject of each step in the first embodiment may be any hardware in the storage device. In other words, this application does not limit which hardware of the storage device executes each step in the first embodiment.
  • the storage device includes multiple processors.
  • S301 to S303 and S304 are respectively executed by different processors of the storage device.
  • different processors of the storage device respectively share the task and data of predicting service processing pressure.
  • the task of reconstruction thereby reducing the pressure on the processor responsible for performing the task of data reconstruction.
  • FIG. 7 is a flowchart of a data reconstruction method provided by an embodiment of the present application.
  • the method is applied to a storage device, and the storage device includes a first processor, a second processor, and one or more hard disks.
  • the first processor and the second processor may be any different processors.
  • the first processor is used to undertake processing tasks corresponding to S3001 to S3003, and the second processor is used to undertake processing tasks corresponding to S3004.
  • the first processor is a GPU, an embedded neural-network processing unit (NPU), or a CPU, or the first processor may also be an integrated circuit.
  • the first processor may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • the first processor may be a single-core processor or a multi-core processor.
  • the second processor is, for example, a CPU, a network processor (NP), a microprocessor, or may be one or more integrated circuits for implementing the solution of the present application, for example, an ASIC, a PLD, or a combination thereof.
  • the above-mentioned PLD can be CPLD, FPGA, GAL or any combination thereof.
  • the second processor may be a single-core processor or a multi-core processor.
  • the first processor is a GPU and the second processor is a CPU.
  • the second embodiment includes the following S401 to S405.
  • S401 is the same as S301
  • S402 is the same as S302
  • S403 is the same as S303
  • S405 is the same as S304.
  • S401 The first processor obtains service pressure information of the storage device.
  • the first processor determines that the performance index of the storage device at the first reconstruction speed meets a preset condition.
  • the first processor adjusts the first reconstruction speed according to the service pressure information to obtain the second reconstruction speed.
  • S404 The first processor sends the second reconstruction speed to the second processor.
  • the second processor receives the second reconstruction speed from the first processor, and the second processor performs data reconstruction on the data stored in the failed disk in the storage device according to the second reconstruction speed.
  • the first processor executes the task of obtaining business pressure information and the task of determining the adjusted reconstruction speed
  • the second processor performs data reconstruction according to the reconstruction speed obtained by the first processor. Structure. Since the first processor and the second processor share the task of predicting the service processing pressure, calculating the reconstruction speed, and the task of data reconstruction, the task of predicting the service processing pressure and the task of calculating the reconstruction speed are offloaded to the second processor.
  • One processor thereby reducing the processing pressure of the second processor and saving the overhead of the second processor, so that the second processor can reserve more computing power to perform other tasks, thereby increasing the computing power of the second processor, Therefore, it helps to improve the performance of the second processor.
  • the second embodiment described above offloads S301 to S303 to other processors inside the storage device for execution.
  • the cloud device and the storage device communicate through the network.
  • the cloud device is, for example, a host, a server, a personal computer, or other devices with computing processing capabilities.
  • the following embodiment 3 describes the process of data reconstruction when the cloud device undertakes the work of S301 to S303.
  • the third embodiment relates to how the storage device interacts with the cloud device to dynamically adjust the reconstruction speed based on the business processing pressure.
  • the third embodiment is applied in a distributed storage system.
  • the cloud device and the storage device are different node devices in the same distributed storage system.
  • the storage device is a storage node in the distributed storage system.
  • the cloud device is a computing node in a distributed storage system, for example, a cloud device is a storage client in a distributed storage system.
  • the cloud device in the third embodiment is the server 205 or the server 206 in the system architecture 200
  • the storage device in the third embodiment is the server 201, the server 202, the server 203 or the server 204 in the system architecture 200.
  • FIG. 8 is a flowchart of a data reconstruction method provided by an embodiment of the present application, and the interaction body of the method includes a cloud device and a storage device.
  • the third embodiment includes the following S501 to S505.
  • S501 is the same as S301
  • S502 is the same as S302
  • S503 is the same as S303
  • S505 is the same as S304.
  • the cloud device obtains business pressure information.
  • the cloud device determines that the performance index of the storage device at the first reconstruction speed meets a preset condition.
  • the cloud device adjusts the first reconstruction speed according to the service pressure information to obtain the second reconstruction speed.
  • the cloud device sends the second reconstruction speed to the storage device.
  • the storage device receives the second reconstruction speed from the cloud device, and the storage device performs data reconstruction on the data stored in the failed disk in the storage device according to the second reconstruction speed.
  • the task of obtaining business pressure information and the task of determining the adjusted reconstruction speed are performed by the cloud device, and the data is reconstructed by the storage device according to the reconstruction speed obtained by the cloud device.
  • cloud devices and storage devices share the task of predicting business processing pressure, computing reconstruction speed, and data reconstruction tasks, the tasks of predicting business processing pressure and computing reconstruction speed are offloaded to cloud devices, thereby reducing The processing pressure of the storage device is reduced, and the cost of the storage device is saved, so that the storage device can reserve more computing power to perform other tasks, thereby increasing the computing power of the storage device, thereby helping to improve the performance of the storage device.
  • the data reconstruction method of the embodiment of the present application is described above, and the data reconstruction device of the embodiment of the present application is described below. It should be understood that the data reconstruction device has any function of the storage device in the foregoing method.
  • FIG. 9 is a schematic structural diagram of a data reconstruction device provided by an embodiment of the present application.
  • the data reconstruction device 900 includes: an acquisition module 901 for performing S301; an adjustment module 902 for performing S303; The data reconstruction module 903 is used to perform S304.
  • the data reconstruction device 900 further includes a determining module for performing S302.
  • the data reconstruction apparatus 900 corresponds to the storage device in the first embodiment, the second embodiment, or the third embodiment.
  • the modules in the data reconstruction apparatus 900 and the other operations and/or functions described above are used to implement the foregoing embodiments. 1.
  • the various steps and methods implemented by the storage device in the second or third embodiment for specific details, please refer to the above-mentioned first, second or third embodiment. For brevity, the details are not repeated here.
  • the data reconstruction device 900 only uses the division of the above-mentioned functional modules for illustration during data reconstruction.
  • the above-mentioned functions can be allocated by different functional modules as required, that is, the data reconstruction device
  • the internal structure is divided into different functional modules to complete all or part of the functions described above.
  • the data reconstruction device provided in the foregoing embodiment belongs to the same concept as the foregoing embodiment 1, embodiment 2, or embodiment 3.
  • FIG. 10 is a schematic structural diagram of a storage device provided by an embodiment of the present application.
  • the storage device 1000 includes a first processor 1001, a second processor 1011, a communication bus 1002, a memory 1003, and at least one communication interface 1004, and One or more hard drives.
  • the one or more hard disks include, for example, a hard disk 102, a hard disk 103, a hard disk 104, and a hard disk 105.
  • the first processor 1001 is configured to execute S401 to S404.
  • the second processor 1011 is configured to execute S405.
  • the communication bus 1002 is used to transfer information between different components in the storage device 1000.
  • the communication bus 1002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication bus 1002 includes, but is not limited to, the high-speed serial computer expansion bus standard (peripheral component interconnect express, abbreviated as PCIe) bus, memory fabric (memory fabric), fiber channel (FC), small computer system interface (SCSI, Small) Computer System Interface), Ethernet, etc.
  • PCIe peripheral component interconnect express
  • the memory 1003 can be a read-only memory (read-only memory, ROM) or other types of static storage devices 1000 that can store static information and instructions, or can be a random access memory (RAM) or can store information and Other types of dynamic storage devices 1000 for instructions can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other Optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices 1000, or can be used to carry or store expectations in the form of instructions or data structures The program code and any other medium that can be accessed by the computer, but not limited to this.
  • the memory 1003 may exist independently and is connected to the processor 1001 through a communication bus 1002.
  • the memory 1003 may also be integrated with the processor 1001.
  • the memory 1003 is used to store the program code 1010 for executing the solution of the present application, and the processor 1001 can execute the program code 1010 stored in the memory 1003. That is, the storage device 1000 can implement the data reconstruction method provided by the method embodiment through the processor 1001 and the program code 1010 in the memory 1003.
  • the communication interface 1004 uses any device such as a transceiver for communicating with other devices or communication networks.
  • the communication interface 1004 includes a wired communication interface, and may also include a wireless communication interface.
  • the wired communication interface may be, for example, an Ethernet interface.
  • the Ethernet interface can be an optical interface, an electrical interface, or a combination thereof.
  • the wireless communication interface may be a wireless local area network (WLAN) interface, a cellular network communication interface, or a combination thereof.
  • the transceiver is used to communicate with other devices or communication networks, and the way of network communication can be but not limited to Ethernet, wireless access network (RAN), wireless local area networks (WLAN), etc.
  • the storage device 1000 may further include an output device 1006 and an input device 1007.
  • the output device 1006 communicates with the processor 1001 and can display information in a variety of ways.
  • the output device 1006 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc.
  • the input device 1007 communicates with the processor 1001, and can receive user input in a variety of ways.
  • the input device 1007 may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • first processor 1001 and the second processor 1011 are integrated, and the first processor 1001 and the second processor 1011 are integrated together.
  • the processor 1011 is the same processor of the storage device 1000, and the processor executes S301 to S304.
  • the storage device uses the same processor to perform the task of predicting business processing pressure and the task of data reconstruction.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
  • the computer program product includes one or more computer program instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program instructions can be passed from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a digital video disc (DVD), or a semiconductor medium (for example, a solid state hard disk).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据重构方法、存储设备及存储介质,属于存储技术领域。本申请提供了一种基于业务处理压力动态调整重构速度的方法,当存储设备处于业务空闲状态时,存储设备的业务处理压力小,则数据重构会提速,从而充分地利用空闲资源进行数据重构,提高存储设备的资源利用率,节约重构花费的时间,提高设备可靠性。而当存储设备处于业务繁忙状态,存储设备的业务处理压力大,则数据重构会减速,从而避免存储设备的重构过程占用过多的资源,从而减少数据重构的过程造成存储设备的业务阻塞。因此有助于存储设备在重构速度和业务处理性能这两方面之间取得平衡。

Description

数据重构方法、存储设备及存储介质
本申请要求于2020年02月10日提交的申请号为202010085179.0、发明名称为“数据重构方法、存储设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,特别涉及一种数据重构方法、存储设备及存储介质。
背景技术
数据重构是存储设备用来恢复数据的有效方式之一,也是保证存储可靠性的关键技术之一。数据重构是指利用纠删码(Erasure Code,简称:EC)算法,对丢失的数据进行恢复的技术。例如,存储设备通常会对n个数据条带(data stip)进行冗余编码,生成m个校验条带(parity stip),这(n+m)个条带组成一个分条(stripe),存储设备可以把这(n+m)个条带分散存储在不同的硬盘上。当某个硬盘发生故障,导致该硬盘中的数据丢失时,只要丢失条带的个数不大于m,就可以用n个未丢失的条带对已丢失的条带进行重构。其中,m和n均为正整数。
时下,当存储设备确定存储设备中的硬盘发生故障时,存储设备会读取预先设定的重构速度,固定地按照该重构速度进行数据重构。
采用上述方法时,当存储设备处于业务空闲状态时,由于数据重构的过程中重构速度是固定不变的,会导致存储设备无法利用空闲资源进行数据重构,造成资源利用率低下;而当存储设备处于业务繁忙状态时,由于数据重构的过程中重构速度是固定不变的,会导致存储设备的重构过程占用过多的资源,造成存储设备的业务阻塞。
发明内容
本申请实施例提供了一种数据重构方法、存储设备及存储介质,有助于提升资源利用率或避免业务阻塞。所述技术方案如下:
第一方面,提供了一种数据重构方法,在该方法中,存储设备获取业务压力信息,通过业务压力信息能够指明所述存储设备的业务处理压力;存储设备根据所述业务压力信息,对第一重构速度进行调整,得到第二重构速度,所述第一重构速度为所述存储设备当前的数据重构速度,而所述第二重构速度和所述业务处理压力负相关,业务处理压力越小,则第二重构速度越大。存储设备按照调整后的第二重构速度,对所述存储设备中故障盘存储的数据进行数据重构。
以上提供了一种基于业务处理压力动态调整重构速度的方法,通过根据存储设备的业务压力信息,来对存储设备当前的数据重构速度进行调整,按照调整后的重构速度进行数据重构,当存储设备处于业务空闲状态时,存储设备的业务处理压力小,则数据重构会提速,从而充分地利用空闲资源进行数据重构,提高存储设备的资源利用率,节约重构花费的时间,提高设备可靠性。而当存储设备处于业务繁忙状态,存储设备的业务处理压力大,则数据重 构会减速,从而避免存储设备的重构过程占用过多的资源,从而减少数据重构的过程对存储设备的业务处理性能的影响,避免存储设备的业务阻塞。因此,该方法有助于存储设备在重构速度和业务处理性能这两方面之间取得平衡。
可选地,在调整重构速度的过程中,存储设备根据所述业务压力信息,获取调整步长,所述调整步长和所述业务处理压力负相关;根存储设备据所述调整步长以及所述第一重构速度,获取所述第二重构速度,所述第二重构速度为所述第一重构速度和所述调整步长的和值。
通过这种可选方式,由于根据历史业务压力信息预测业务压力信息后,预测出的业务压力信息和实际的业务处理压力相比,可能存在一定的偏差。因而,根据业务压力信息确定出的第一重构速度可能不是当下最优的数据重构速度。而通过上述方法,不仅以第一重构速度为初始值进行重构,还通过反馈调节的方式,通过存储设备反馈的性能指标的当前值,来确定调整步长,根据初始值以及调整步长进行调整,有助于快速调整到实际的业务压力下最优的数据重构速度,减少重构速度的爬坡时间。
可选地,在数据重构的过程中,存储设备在第一时间点,按照所述第二重构速度对所述故障盘存储的数据进行数据重构,所述第一时间点是以第二时间点为基准偏移预设时长后得到的时间点,所述第二时间点为所述业务处理压力发生变化的时间点。
通过这种可选方式,在存储设备的业务处理压力发生切换的情况下,存储设备通过提前调整或滞后调整,来降低重构速度的调整过程对业务的影响,降低存储设备的性能波动。
可选地,所述第二时间点为所述业务处理压力发生下降的时间点,所述第一时间点晚于所述第二时间点。
通过这种可选方式,在存储设备的业务处理压力下降的情况下,存储设备通过滞后调节重构速度,能够保证从高业务压力平稳过渡到低业务压力,保证现有负载下的输入输出(Input/Output,IO)请求被处理完成。
可选地,所述第二时间点为所述业务处理压力发生上升的时间点,所述第一时间点早于所述第二时间点。
通过这种可选方式,在存储设备的业务处理压力将要上升时,通过提前降低重构速度,能够保证业务压力上升时,重构速度已经被降低到合理的值,从而避免高业务压力下,仍以高重构速度进行数据重构时会造成的业务阻塞问题。
可选地,在获取业务压力信息的过程中,存储设备将历史业务压力信息输入预测模型,所述历史业务压力信息用于指示所述存储设备在历史时间点的业务处理压力;通过所述预测模型对所述历史业务压力信息进行处理,输出所述业务压力信息。
通过这种可选方式,业务压力信息反映着存储设备的业务处理压力随着时间而变化的规律,存储设备通过在历史运行中统计历史业务压力信息,利用预测模型,能够从历史业务压力信息中挖掘出这一规律,从而预测出未来的业务压力信息,这一方式预测出的业务压力信息较为精确,因此通过业务压力信息来确定重构速度时,有助于提高重构速度的精确性。
可选地,所述历史业务压力信息包括以下至少一项:所述存储设备在历史时间点的中央处理器CPU利用率;或,所述存储设备在历史时间点的每秒读写次数(input output per second,IOPS);或,所述存储设备在历史时间点的盘带宽;或,所述存储设备在历史时间点的网络互连协议网际互连协议(英文:internet protocol,简称:IP)框带宽;或,所述存储设备在历史时间点的垃圾收集(英文:Garbage Collection,简称:GC)并发特征;或,所述存储设 备在历史时间点的重删压缩特征。
通过上述实现方式,通过存储设备的资源来表征存储设备的业务处理压力,例如,由于执行数据重构导致存储设备的资源消耗大时,则存储设备的资源使用量大,那么以资源使用量作为业务压力信息时,能够准确地刻画出业务处理压力的大小,从而有助于适应在复杂客户场景和系统资源发生变化的情形下的重构速度动态调整,提升场景和设备适配性。
可选地,存储设备先确定所述存储设备在所述第一重构速度下的性能指标满足预设条件,在满足预设条件的情况下,再根据所述业务压力信息,对第一重构速度进行调整。
通过上述实现方式,存储设备在数据重构的过程中,能够根据当前的重构速度对设备性能的影响程度,来判定当前是否要调整重构速度,从而提高调整重构速度的灵活性。
可选地,所述预设条件包括:所述性能指标的当前值与所述性能指标的期望值之间的差距大于阈值。
通过上述实现方式,在存储设备进行数据重构的过程中,性能指标的当前值能够体现当前的重构速度下存储设备的性能,而性能指标的期望值能够体现允许数据重构对存储设备性能造成的最大影响,在性能指标的当前值与性能指标的期望值之间的差距大于阈值时,表明当前的重构速度已经对存储设备的性能产生了很大的影响,那么通过调整重构速度,能够减少数据重构对存储设备的性能的影响,有助于避免数据重构的过程造成存储设备的性能急剧下降,保证存储设备在进行数据重构的过程中,性能指标仍可满足期望。而如果性能指标的当前值与性能指标的期望值之间的差距小于阈值,通过保持以当前的重构速度进行数据重构,能够充分利用存储设备的资源进行重构,从而提升资源利用率。
第二方面,提供了一种存储设备,存储设备包括第一处理器、第二处理器和一个或多个硬盘;所述第一处理器,用于获取业务压力信息;根据所述业务压力信息,对第一重构速度进行调整,得到第二重构速度;所述第二处理器,用于对所述一个或多个硬盘中的故障盘存储的数据进行数据重构。第二方面提供的存储设备的具体细节可参见上述第一方面或第一方面任一种可选方式,此处不再赘述。
第三方面,提供了一种存储设备,该存储设备包括处理器,该处理器用于执行上述第一方面或第一方面任一种可选方式所提供的数据重构方法。第三方面提供的存储设备的具体细节可参见上述第一方面或第一方面任一种可选方式,此处不再赘述。
第四方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令,该指令由处理器读取以使存储设备执行上述第一方面或第一方面任一种可选方式所提供的数据重构方法。
第五方面,提供了一种芯片,当该芯片在存储设备上运行时,使得存储设备执行上述第一方面或第一方面任一种可选方式所提供的数据重构方法。
第六方面,提供了一种计算机程序产品,当该计算机程序产品在存储设备上运行时,使得存储设备执行上述第一方面或第一方面任一种可选方式所提供的数据重构方法。
附图说明
图1是本申请实施例提供的一种系统架构的示意图;
图2是本申请实施例提供的一种系统架构的示意图;
图3是本申请实施例提供的一种数据重构方法的流程图;
图4是本申请实施例提供的一种业务处理压力变化与重构速度变化之间的关系示意图;
图5是本申请实施例提供的一种数据重构方法的软件架构图;
图6是本申请实施例提供的一种重构速度调整方法的流程图;
图7是本申请实施例提供的一种数据重构方法的流程图;
图8是本申请实施例提供的一种数据重构方法的流程图;
图9是本申请实施例提供的一种数据重构装置的结构示意图;
图10是本申请实施例提供的一种存储设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一重构速度可以被称为第二重构速度,并且类似地,第二重构速度可以被称为第一重构速度。第一重构速度和第二重构速度都可以是重构速度,并且在某些情况下,可以是单独且不同的重构速度。
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个硬盘是指两个或两个以上的硬盘。本文中术语“系统”和“网络”经常可互换使用。
应理解,在本文中对各种所述示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种所述示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”“an”)”和“该”旨在也包括复数形式,除非上下文另外明确地指示。
还应理解,本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中的字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。
还应理解,本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中的字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,术语“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定...”或“如果检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。
以下,示例性介绍本申请的应用场景。
本申请实施例提供的数据重构方法能够应用在存储设备进行数据重构的场景。具体而言,本申请实施例的数据重构方法能够应用在集中式存储设备或分布式存储系统进行数据重构的场景,下面对数据重构的场景进行简单的介绍。
数据重构是存储设备恢复数据的有效方式之一,是保证存储可靠性的关键技术之一。随着介质技术的不断发展,单盘容量不断增加,导致盘重构时间偏长,尤其对于大容量盘而言,重构时间偏长,导致数据可靠性较低。
对于存储设备而言,数据重构所需的时间越短,存储设备的可靠性越高,因此重构提速就成为了热门的研究方向。重构提速最直接的方式就是改进EC算法,降低重构数据依赖等方式实现重构提速,然而不同EC算法之间的差异并不显著,很难直接拉开较大差距。此外,目前也有通过减少或停止其他业务,增加系统资源量等方式加快重构速度,但这种方式会对其他业务(例如上层业务)产生较大的冲击。因此,如何在保证较小影响主机业务的前提下,实现重构速度提升就成为了目前业界关注的焦点。
有鉴于此,在下述各个实施例中,提供了基于业务压力的重构速度动态调整策略,通过最大限度的使用系统的空余资源,提升系统空余资源的利用率,来保证上层业务的前提下尽可能实现重构提速。一方面,通过历史的业务压力规律预测未来的业务压力规律,得知系统的资源余量,从而准确掌握上层业务的变化规律。另一方面,建立重构资源消耗的刻画模型,有效建立重构速度和资源消耗的关系。综合这两个方面,在业务影响可控的范围之内,通过预测的资源余量得到初始的重构速度,再利用基于业务压力的动态步长调整反馈系统,将初始的重构速度快速调整到最优的重构速度。因此,通过业务压力的预测、重构资源消耗刻画模型等方式可以保证其他业务不受影响的前提下,尽可能的利用设备的剩余资源,提高重构速度,同时也有效保证了业务的连续性。
下面介绍本申请实施例提供的系统架构。
参见附图1,本实施例提供了一种系统架构100。系统架构100是对集中式存储设备的举 例说明。集中式存储设备例如是存储阵列,存储阵列包括一个或多个控制器,每个控制器包括一个或多个硬盘。当存储阵列中的硬盘故障后,可以由控制器对故障盘中存储的数据进行数据重构。存储阵列的控制器也称存储控制器,控制器俗称为机头。请参考图1,存储阵列的控制器例如是图1中的控制器101,存储阵列中的硬盘是图1中的硬盘102、硬盘103、硬盘104和硬盘105。其中,图1中的省略号表示图1未示出的其他硬盘。硬盘例如是固态硬盘(英文全称:solid state drive,英文简称:SSD)、机械硬盘(英文全称:hard disk drive,英文简称:HDD)等。可选地,系统架构100中的硬盘为智能硬盘,智能硬盘具有自己的处理器、内存等计算处理所需的资源。
参见附图2,本实施例提供了另一种系统架构200。系统架构200是对分布式存储系统的举例说明。分布式存储系统包括多个存储节点,存储节点例如是服务器,服务器包括一个或多个硬盘。例如,请参考图2,存储节点是图2中的服务器201、服务器202、服务器203或服务器204。
此外,分布式存储系统可选地还包括计算节点,计算节点包括而不限于存储客户端、元数据控制器(Meta Data Controller,简称:MDC)节点、云服务器(Elastic Compute Service,简称:ECS)服务节点、云硬盘备份(Volume Backup Service,简称:VBS)节点等等。计算节点,例如是主机、服务器、个人电脑或其他具有计算处理能力的设备。例如,请参考图2,计算节点是系统架构200中的服务器205或服务器206。
以上介绍了本申请实施例提供的系统架构,以下结合实施例一至实施例三,示例性介绍基于上文提供的系统架构进行数据重构的方法流程。
实施例一
参见图3,图3是本申请实施例提供的一种数据重构方法的流程图,该方法应用于存储设备。
可选地,该存储设备是存储阵列中的控制器,控制器在对存储阵列中故障盘数据重构的场景中,通过执行实施例一,基于存储阵列的业务处理压力调整重构速度。例如,存储设备是系统架构100中的控制器101,控制器101在对硬盘102、硬盘103、硬盘104和硬盘105中的故障盘进行数据重构的过程中,通过执行实施例一,基于系统架构100的业务处理压力调整重构速度。
可选地,该存储设备是分布式存储系统中的存储节点,存储节点在对分布式存储系统中故障盘数据重构的场景中,通过执行实施例一,基于分布式存储系统的业务处理压力调整重构速度。例如,存储设备是系统架构200中的服务器201,服务器201在对服务器201、服务器202或服务器203中的故障盘进行数据重构的过程中,通过执行实施例一,基于系统架构200的业务处理压力调整重构速度。
示例性地,实施例一包括以下S301至S304。
S301、存储设备获取业务压力信息。
业务压力信息用于指示存储设备的业务处理压力。比如,存储设备接收到访问请求,则要执行数据读取的业务,存储设备接收到了写请求,则要执行数据存储的业务。在处理业务的过程中,存储设备会面临着业务处理压力。而通过业务压力信息,能够指明存储设备业务处理压力的大小,以便结合存储设备的业务处理压力来调整重构速度。其中,业务压力信息 的数据形式可以而不限于是数值、向量、矩阵或其他形式。
在一种可能的实现中,业务压力信息包括存储设备的资源占用信息,该资源占用信息例如是资源利用率、资源使用量或资源剩余量。其中,资源包括而不限于计算资源、存储资源以及网络资源中的至少一项。计算资源例如是存储设备的处理器,比如说是通用中央处理器(英文:central processing unit,简称:CPU)、图形处理器(英文:Graphics Processing Unit,简称:GPU)等,存储资源例如是存储设备的硬盘,网络资源例如是存储设备的网卡、带宽等。作为示例,业务压力信息包括而不限于以下(1)至(12)中的任意一种。
(1)存储设备的CPU利用率。
(2)存储设备的每秒读写次数(input output per second,IOPS)。
(3)存储设备的盘带宽,其中,盘带宽包括读带宽和写带宽中的至少一项。
(4)存储设备的网际互连协议(英文:internet protocol,简称:IP)框带宽。
(5)存储设备的垃圾收集(英文:Garbage Collection,简称:GC)并发特征。
(6)存储设备的重删压缩特征。
(7)存储设备的吞吐量。
(8)存储设备的磁盘阵列卡的带宽,其中,存储设备包括一个或多个磁盘阵列卡。磁盘阵列卡的类型可以存在多种情况,例如,磁盘阵列卡包括而不限于串行SCSI(Serial Attached SCSI,SAS)阵列卡、小型计算机系统接口(Small Computer System Interface,SCSI)阵列卡、串行高级技术附件(Serial Advanced Technology Attachment,SATA)阵列卡、电子集成驱动器(Integrated Drive Electronics,IDE)阵列卡中的至少一项。换句话说,SAS阵列卡的带宽、SCSI阵列卡的带宽、SATA阵列卡的带宽、IDE阵列卡的带宽都能够充当业务压力信息,用来计算重构速度。
(9)存储设备单位时间段内接收的访问请求的大小。
(10)存储设备单位时间段内接收的输入输出(Input/Output,IO)请求的读写比例。该读写比例包括读请求在IO请求中的比例或写请求在IO请求中的比例中的至少一项。
(11)存储设备单位时间段内接收的最大访问请求。
(12)存储设备单位时间段内接收的平均访问请求。
通过上述实现方式,通过存储设备的资源来表征存储设备的业务处理压力,例如,由于执行数据重构导致存储设备的资源消耗大时,则存储设备的资源使用量大,那么以资源使用量作为业务压力信息时,能够准确地刻画出业务处理压力的大小,从而有助于适应在复杂客户场景和系统资源发生变化的情形下的重构速度动态调整,提升场景和设备适配性。
可选地,业务压力信息是上述(1)至(12)中的一种,比如,业务压力信息就是IOPS,或者业务压力信息就是CPU利用率。
可选地,业务压力信息是上述(1)至(12)中的两项或两项以上的结合,当业务压力信息包括上述(1)至(12)中的多项时,业务压力信息中的一个维度可以是上述(1)至(12)中的一项。
其中,上述(1)至(12)结合的方式包括而不限于特征拼接的方式。特征拼接是机器学习领域中的术语,特征拼接是指将多个维度的特征以横向拼接或纵向拼接的方式进行组合,得到一种包括每个维度的特征的数据,通俗的讲,特征拼接可以看成将多个向量拼接成一个大矩阵的过程。采用特征拼接的方式时,业务压力信息中的一列可以是上述(1)至(12)中 的任一项,或者,业务压力信息中的一行可以是上述(1)至(12)中的任一项。比如,业务压力信息包括CPU利用率、IOPS以及盘带宽,CPU利用率、IOPS以及盘带宽是业务压力信息中的三个维度,CPU利用率是业务压力信息的第一列,IOPS是业务压力信息的第二列,盘带宽是业务压力信息的第三列。
当然,特征拼接的方式仅是业务压力信息中不同维度的数据进行结合的可选方式,而非必选方式,可选地,上述(1)至(12)中的多项通过特征融合的方式,引入至业务压力信息中。其中,特征融合是机器学习领域中的术语,特征融合是指将多个维度的特征通过相乘的方式或相加的方式换算为一个数值,该数值融合了每个维度的特征。
还应理解,上述(1)至(12)是业务压力信息示意性包括的数据,在另一些实施例中,业务压力信息包括上述(1)至(12)之外的其他数据,或上述(1)至(12)可以忽略,或不采用。在一些可选地实施例中,上述(1)至(12)中的一项或多项被替换为其他数据,该其他数据包括而不限于空闲存储空间、热数据所占的比例等等。
如何获取业务压力信息包括多种实现方式。在一种可能的实现中,存储设备基于业务的历史规律,来预测业务压力信息,以下通过S3011至S3013,对这种实现方式进行举例说明。
S3011、存储设备获取历史业务压力信息。
历史业务压力信息用于指示存储设备在历史时间点的业务处理压力。其中,历史时间点的粒度包括而不限于秒、分钟、小时等等。以粒度为秒为例,历史业务压力信息指示存储设备在过去的一秒内的业务处理压力。
在一种可能的实现中,历史业务压力信息包括而不限于以下(1)至(12)中的任意一种。
(1)存储设备在历史时间点的CPU利用率。
(2)存储设备在历史时间点的IOPS。
(3)存储设备在历史时间点的盘带宽,其中,盘带宽包括读带宽和写带宽中的至少一项。
(4)存储设备在历史时间点的IP框带宽。
(5)存储设备在历史时间点的GC并发特征。
(6)存储设备在历史时间点的重删压缩特征。
(7)存储设备在历史时间点的吞吐量。
(8)存储设备在历史时间点的磁盘阵列卡的带宽。
(9)存储设备在历史时间点的单位时间段内接收的访问请求的大小。
(10)存储设备在历史时间点的单位时间段内接收的IO请求的读写比例。该读写比例包括读请求在IO请求中的比例或写请求在IO请求中的比例中的至少一项。
(11)存储设备在历史时间点的单位时间段内接收的最大访问请求。
(12)存储设备在历史时间点的单位时间段内接收的平均访问请求。
在一些实施例中,存储设备在历史运行中统计历史业务压力信息,将历史业务压力信息保存至硬盘中。当要预测当前的业务压力时,存储设备从硬盘中读取历史业务压力信息。可选地,统计历史业务压力信息的方式为周期性统计。具体地,存储设备每隔一个统计周期,统计一次历史业务压力信息。其中,该统计周期的时间单位例如是分钟、小时等。可选地,存储设备也可以实时统计历史业务压力信息。
S3012、存储设备将历史业务压力信息输入预测模型。
预测模型例如是函数,预测模型的输入参数包括历史业务压力信息,预测模型的输出参 数包括未来时间点的业务压力信息。其中,未来时间点是指晚于历史时间点的时间点,未来时间点和历史时间点之间存在一定大小的时间间隔。其中,未来时间点和历史时间点之间的时间间隔的最小值例如是1分钟。
预测业务压力信息的方式包括多种情况,以下通过情况①至情况②举例说明。
情况①、根据一段时间内采集到的所有历史业务压力信息来预测业务压力信息。
以粒度为1分钟为例,根据第1天的24小时内每分钟的历史业务压力信息、第2天的24小时内每分钟的历史业务压力信息以及第3天的24小时内每分钟的历史业务压力信息,来预测第4天的历史业务压力信息。当然,根据前3天的历史业务压力信息来预测第4天的业务压力信息仅是举例说明,也可以根据最近1个月的历史业务压力信息来预测未来的业务压力信息,本实施例对预测业务压力信息时使用多久时间的历史业务压力信息不做限定。
情况②、根据历史业务压力信息来预测对应时间点的业务压力信息。
在情况②中,未来时间点和历史时间点属于同一时间段。例如,历史时间点包括1月31日晚上八点,根据1月31日晚上八点的历史业务压力信息预测2月1日晚上8点的业务压力信息,那么,由于这两个时间点虽然间隔一天,但都属于八点左右的时间段,而同一时间段对存储设备的访问规律可能存在相似性,因此有助于提高预测出的业务压力信息的精确性。
可选地,向预测模型输入的历史业务压力信息是时间序列的形式,预测模型输出的业务压力信息也是时间序列的形式。例如,历史业务压力信息包括N1个数据,N1个数据对应于N1个历史时间点,N1个数据中每个数据指示存储设备在一个历史时间点的业务处理压力,N1个数据在历史业务压力信息中按照对应的历史时间点的先后顺序进行排列。预测模型输出的业务压力信息包括N2个数据,N2个数据对应于N2个未来时间点,N2个数据中每个数据指示存储设备在一个未来时间点的业务处理压力,N2个数据在业务压力信息中按照未来时间点的先后顺序进行排列。通过这种方式,能够根据过去N1个时刻的业务压力情况,通过算法预测出未来N2个时刻的业务压力情况。其中,N1和N2均为正整数。N1大于1,N2大于或等于1。
预测模型的类型包括多种情况,以下通过情况(a)和情况(b)进行举例说明。
情况(a)、预测模型是时间序列预测模型。例如,预测模型是差分整合移动平均自回归模型(Autoregressive Integrated Moving Average model,ARIMA)、指数平滑模型、周期识别模型等等。
情况(b)、预测模型是机器学习模型,例如预测模型是线性拟合模型、逻辑回归模型、深度学习模型。例如,采用深度学习模型实现预测模型时,预测模型例如是卷积神经网络,长短期记忆(Long Short-Term Memory,LSTM)网络等等。可选地,在训练深度学习模型的过程中,以历史业务压力信息为样本,以未来时间点的业务压力信息为目标值进行训练,得到深度学习模型。
S3013、存储设备通过预测模型对历史业务压力信息进行处理,输出业务压力信息。
根据模型的具体类型的不同,通过预测模型进行处理的过程可以存在差异。例如,预测模型为ARIMA模型,ARIMA模型处理历史业务压力信息的过程包括自回归计算的过程和移动平均的过程。又如,预测模型为深度学习模型,深度学习模型处理历史业务压力信息的过程包括特征提取的过程和根据特征分类的过程。
通过上述S3011至S3013,达到的效果包括:业务压力信息反映着存储设备的业务处理 压力随着时间而变化的规律,存储设备通过在历史运行中统计历史业务压力信息,利用预测模型,能够从历史业务压力信息中挖掘出这一规律,从而预测出未来的业务压力信息,这一方式预测出的业务压力信息较为精确,因此通过业务压力信息来确定重构速度时,有助于提高重构速度的精确性。
S302、存储设备确定存储设备在第一重构速度下的性能指标满足预设条件。
本实施例中,存储设备的重构速度不再是固定不变的,而是能够动态调整的,对重构速度进行调整后,重构速度的取值会发生变化。为了区分描述不同的重构速度,将调整前的重构速度称为第一重构速度,将调整后的重构速度称为第二重构速度。
第一重构速度为存储设备当前的数据重构速度。可选地,第一重构速度为数据重构过程中重构速度的初始值。
可选地,第一重构速度根据业务压力信息确定,换句话说,第一重构速度是根据业务压力信息得出的重构速度推荐值。具体地,在预测出一个未来时间点的业务处理压力的基础上,可以提供该未来时间点推荐使用的重构速度,当时间到达该未来时间点时,以推荐使用的重构速度为初始值进行数据重构。其中,这里描述的推荐使用的重构速度即为第一重构速度。
如何根据业务压力信息确定第一重构速度包括多种实现方式。在一种可能的实现中,存储设备将历史业务压力信息输入重构速度确定模型,通过重构速度确定模型对历史业务压力信息进行处理,输出第一重构速度。可选地,该重构速度确定模型是机器学习模型,例如该重构速度确定模型是线性拟合模型、逻辑回归模型、支持向量回归(support vector regression,SVR)模型、深度神经网络(Deep Neural Network,DNN)模型、卷积神经网络模型等等。
示意性地,以数学的方式表述,重构速度确定模型为以下公式(1)。
y 0=f(R);公式(1)
公式(1)中,R表示业务压力信息,R的数据形式例如是矩阵或数字。y 0表示第一重构速度(比如重构速度的初始值),f表示映射关系,即函数。可选地,重构速度的初始值包括不同等级的确定方式,即在y 0前面乘以等级系数或者其他方式。
在一种可能的实现中,业务压力信息为时间序列的形式,业务压力信息指示多个未来时间点的业务处理压力,存储设备根据业务压力信息,确定多个未来时间点中每个未来时间点对应的第一重构速度,保存未来时间点与第一重构速度之间的对应关系。并且,存储设备启动定时器,当时间到达一个未来时间点时,根据预先保存的对应关系,以该未来时间点对应的第一重构速度为初始值,开始进行数据重构。例如,业务压力信息指明晚上8点时,存储设备的业务处理压力将要达到压力1,而晚上9点时,存储设备的业务压力信息将要达到压力2。存储设备根据压力1推荐使用重构速度A,根据压力2推荐使用重构速度B,那么,当时间到达晚上8点时,存储设备以重构速度A为初始值,开始进行数据重构,此时第一重构速度即为重构速度A。同理地,当时间到达晚上9点时,存储设备以重构速度B为初始值,开始进行数据重构,此时第一重构速度即为重构速度B。通过这种方式,存储设备在每个时刻,以业务压力信息对应的重构速度为初始值开始重构,使得每个时刻的重构速度与对应时刻的业务处理压力匹配,从而灵活地调整数据重构速度。
在一些实施例中,存储设备在数据重构的过程中,根据当前的重构速度对设备性能的影响程度,来判定当前是否要调整重构速度。可选地,设备的性能通过设备的性能指标的取值来表征。在一种可能的实现中,在以第一重构速度为初始速度进行数据重构的过程中,存储 设备判断在第一重构速度下的性能指标是否满足预设条件,若存储设备在第一重构速度下的性能指标满足预设条件,则执行S303以调整重构速度,若存储设备在第一重构速度下的性能指标不满足预设条件,则保持当前的重构速度不变。
性能指标用于指示存储设备的性能。例如,性能指标包括IO请求的时延,IO请求的时延例如是存储设备从接收到IO请求到完成读写数据所花费的时长。可选地,性能指标的取值是单位时间段内的平均值。例如,存储设备采集单位时间段内处理IO请求的总数量以及处理这些IO请求的总时长,对总时长与总数量计算平均值,得到单位时间段内IO请求的时延。
预设条件包括:性能指标的当前值与性能指标的期望值之间的差距大于阈值。可选地,性能指标的期望值由用户预先设定。例如,在存储设备中创建一个逻辑单元(LUN,LUN是指通过存储硬件虚拟出的逻辑盘,存储设备的操作系统通常会将LUN视为一块可以使用的硬盘),期望即使在存储设备进行数据重构的情况下,LUN下IO下发的时延也不超过M,那么这个M就是性能指标的期望值,其中M为正数。采用这种预设条件时,判断性能指标是否满足预设条件的过程包括:存储设备统计性能指标的当前值,存储设备读取预先设定的性能指标的期望值,存储设备计算该性能指标的当前值与性能指标的期望值之间的差距,对该差距与阈值进行比较,若该差距大于阈值,表明性能指标的当前值和性能指标的期望值差距较大,则存储设备执行S303以调整重构速度,以降低数据重构的过程对存储设备的性能的影响;若该差距小于阈值,表明性能指标的当前值和性能指标的期望值比较接近,则存储设备保持当前的重构速度不变。
可选地,若性能指标的当前值与性能指标的期望值之间的差距等于阈值,存储设备执行的步骤包括而不限于以下方式I至方式II中的任一者。
方式I、存储设备执行S303以调整重构速度。
方式II、存储设备不调整重构速度,而是保持当前的重构速度不变。
可选地,在判断差距是否大于阈值的过程中,存储设备使用的阈值是预先设定的值。阈值用于判定性能指标的当前值和性能指标的期望值之间的差距大小。
可选地,在通过在当前的重构速度的基础上增加调整步长来调整重构速度的情况下,若性能指标的当前值与性能指标的期望值之间的差距小于阈值,则存储设备通过将调整步长配置为0,从而保持当前的重构速度不变,差距大于阈值时,则存储设备计算调整步长,根据调整步长来来调整重构速度。以数学的方式表达,即判断|t 0-t 1|是否<e,如果|t 0-t 1|<e,则调整步长为零。其中,e表示阈值,e是常量。此外,如果|t 0-t 1|=e,在一种可能的实现中,存储设备根据调整步长来来调整重构速度,在另一种可能的实现中,存储设备将调整步长配置为零,从而保持当前的重构速度不变,也即是,本实施例对|t 0-t 1|=e时存储设备是否调整重构速度并不做限定。
通过根据性能指标是否满足条件来判定是否调整重构速度,达到的效果至少包括:在存储设备进行数据重构的过程中,性能指标的当前值能够体现当前的重构速度下存储设备的性能,而性能指标的期望值能够体现允许数据重构对存储设备性能造成的最大影响,在性能指标的当前值与性能指标的期望值之间的差距大于阈值时,表明当前的重构速度已经对存储设备的性能产生了很大的影响,那么通过调整重构速度,能够减少数据重构对存储设备的性能的影响,有助于避免数据重构的过程造成存储设备的性能急剧下降,保证存储设备在进行数据重构的过程中,性能指标仍可满足期望。而如果性能指标的当前值与性能指标的期望值之 间的差距小于阈值,通过保持以当前的重构速度进行数据重构,能够充分利用存储设备的资源进行重构,从而提升资源利用率。
应理解,S302为可选步骤。可选地,存储设备执行S301后,存储设备不执行S302,而是跳过S302,直接执行S303。
S303、存储设备根据业务压力信息,对第一重构速度进行调整,得到第二重构速度。
其中,第二重构速度和业务处理压力负相关。负相关例如是第二重构速度和业务处理压力成反比,负相关的含义包括而不限于以下两个方面。
方面一、业务处理压力越小,则第二重构速度越大。也即是,存储设备的业务处理压力降低,则数据重构速度就会增大,存储设备的数据重构过程提速。通过这种方式,当存储设备处于业务空闲状态时,存储设备的业务处理压力小,此时通过提高存储设备的重构速度,能够充分利用存储设备的空闲资源,提高存储设备的资源利用率。并且,由于重构速度得以提升,能够提高存储设备的可靠性,有助于提高存储设备的性价比。
方面二、业务处理压力越大,第二重构速度越小。也即是,存储设备的业务处理压力增大,则数据重构速度就会减小,存储设备的数据重构过程降速。通过这种方式,若存储设备处于业务繁忙状态,存储设备的业务处理压力大,此时通过降低存储设备的重构速度,降低数据重构过程对存储设备性能的影响,从而保证数据重构对业务的影响处于可控的范围,避免数据重构造成业务阻塞、设备宕机等情况。
通过以上两个方面,实现了一种动态调整重构速度的策略,有助于在重构速度和设备性能之间取得一个平衡。
在存储设备发生系统升级的场景下,存储设备的能力可能随着升级的过程而得以提升,使得存储设备的业务处理压力发生变化。可选地,在这一场景下,获取存储设备升级后的业务压力信息,业务压力信息用于指示存储设备升级后的业务处理压力,根据存储设备升级后的业务压力信息,对第一重构速度进行调整,得到第二重构速度。通过这一方式,能够动态感知存储设备当前的能力,结合存储设备当前的能力来调整重构速度,使得重构速度适配于升级后的存储设备,那么当存储设备升级后业务处理压力增大时,通过执行该方法,数据重构会自动降速,从而避免存储设备升级后业务阻塞,当存储设备升级后业务处理压力减小时,通过执行该方法,数据重构会自动提速,从而提高存储设备升级后的资源利用率。
如何根据业务压力信息调整重构速度包括多种实现方式。
在一种可能的实现中,通过反馈调节的方式调整重构速度。其中,反馈调节是指一个系统的工作结果,反过来又作为输入参数调节该系统的方式。应用在数据重构的场景下,考虑到重构的过程可能会对存储设备的性能产生影响,存储设备在第一重构速度下进行数据重构的过程中,可以根据性能指标的当前值以及业务压力信息来调整第一重构速度。通过这种反馈调节的方式,有助于动态地调节至最优的重构速度。
示例性地,反馈调节的方式包括以下S3031至S3032。
S3031、存储设备确定调整步长。
调整步长是指重构速度调整过程的步长,即对重构速度调整一次后重构速度变化的幅度,也就是重构速度的增量。调整步长包括而不限于以下情况一至情况二。
情况一、调整步长是动态变化的。
可选地,在数据重构的过程中,存储设备多次执行重构速度调整的步骤,从而逐渐从第 一重构速度逼近第二重构速度。每次调整重构速度时,使用的调整步长可以是不同的。
可选地,当前的重构速度越逼近第二重构速度,则调整步长越小。例如,在数据重构早期,使用第一调整步长进行调整,而在数据重构后期,使用第二调整步长进行调整,第一调整步长大于第二调整步长。通过这种方式,在数据重构早期,即刚开始调整重构速度时,会大幅度的调整重构速度,有助于从当前的重构速度快速逼近第二重构速度,而在数据重构后期,即重构速度将要调整完成时,小幅度的调整重构速度,从而降低爬坡时间。其中,爬坡时间是指根据调整步长进行调整所需的时间。
如何确定调整步长包括多种实现方式,以下通过实现方式一至实现方式二举例说明。
实现方式一、获取性能指标的当前值,根据性能指标的当前值与性能指标的期望值之间的差距,获取调整步长。其中,调整步长与性能指标的当前值与性能指标的期望值之间的差距正相关。也即是,性能指标的当前值与性能指标的期望值之间的差距越大,则调整步长越大。以数学的方式表述,调整步长例如是下述公式(2):
△y=C|t 0-t 1|;公式(2)
公式(2)中,△y表示调整步长,C为一个常数,t 0表示性能指标的期望值,t 1表示当前重构速度下性能指标的当前值。|·|表示取绝对值。
通过实现方式一,根据存储设备的性能指标来动态调整重构速度的变化率。
实现方式二、根据业务压力信息,获取调整步长,调整步长和业务处理压力负相关。例如,调整步长和业务处理压力成反比,业务处理压力越大,则调整步长越小,业务处理压力越小,则调整步长越大。可选地,实现方式二同样引入性能指标参与运算,例如,获取性能指标的当前值和业务压力信息,根据业务压力信息、性能指标的当前值与性能指标的期望值之间的差距,获取调整步长。例如,采用以下公式(3)进行计算,得到调整步长。
△y=K|t 0-t 1|/R;公式(3)
公式(3)中,△y表示调整步长,K表示常数。R表示业务压力信息。t 0表示性能指标的期望值,t 1表示当前重构速度下性能指标的当前值。/表示相除。从公式(3)可以看出,△y与R的值成反比,从而保证调整步长与业务压力负相关,将调整步长变为与业务压力相关的映射。|·|表示取绝对值。
情况二、调整步长是固定的。
例如,预先设定调整步长,每次调整重构速度时,均使用该预先设定的调整步长来调整重构速度。
S3032、存储设备根据调整步长以及第一重构速度,获取第二重构速度。
其中,第二重构速度为第一重构速度和调整步长的和值。例如,第二重构速度通过以下公式(4)表达。
y=y 0+sgn(t 0-t 1)*△y;公式(4)
公式(4)中,y表示第二重构速度,即调整后的重构速度。y 0表示第一重构速度,例如重构速度的初始值。sgn是一个符号函数。△y表示调整步长。t 0表示性能指标的期望值,t 1表示当前重构速度下性能指标的当前值。*表示相乘。
由于根据历史业务压力信息预测业务压力信息后,预测出的业务压力信息和实际的业务处理压力相比,可能存在一定的偏差。因而,根据业务压力信息确定出的第一重构速度可能不是当下最优的数据重构速度。而通过上述方法,不仅以第一重构速度为初始值进行重构, 还通过反馈调节的方式,通过存储设备反馈的性能指标的当前值,来确定调整步长,根据初始值以及调整步长进行调整,有助于快速调整到实际的业务压力下最优的数据重构速度,减少重构速度的爬坡时间。
S304、存储设备按照第二重构速度,对存储设备中故障盘存储的数据进行数据重构。
可选地,在存储设备的业务处理压力发生切换的情况下,存储设备通过提前调整或滞后调整,来降低重构速度的调整过程对业务的影响,降低存储设备的性能波动。
以存储设备按照调整后的重构速度进行数据重构的时间点称为第一时间点,而存储设备的业务处理压力发生变化的时间点称为第二时间点为例进行说明。若检测到存储设备在第二时间点业务处理压力发生变化,以第二时间点为基准偏移预设时长,得到第一时间点,在第一时间点,按照第二重构速度对故障盘存储的数据进行数据重构。
可选地,存储设备采用提前调整的策略还是采用滞后调整的策略根据业务压力切换的方式决定。业务压力切换的方式包括高低切换和低高切换这两种情况,以下通过情况A至情况B进行具体说明。
情况A、若存储设备的业务处理压力下降,即业务处理压力发生高低切换时,存储设备滞后调整重构速度。具体地,若存储设备检测到在第二时间点业务处理压力发生下降,存储设备以第二时间点为基准,将第二时间点向后偏移预设时长,得到晚于第二时间点的第一时间点,在第一时间点,存储设备按照第二重构速度对故障盘存储的数据进行数据重构。例如,请参见图4,第一时间点例如是t_a,第二时间点例如是t_b,预设时长例如是ΔT,从业务压力变化曲线可以看出,在时间为t_a时,存储设备的业务处理压力下降,在这一情况下,滞后了ΔT后,在时间为t_b时,存储设备的重构速度上升。可选地,调整重构速度的过程通过发送重构速度指令的方式实现,存储设备通过在压力下降的时间点之后的时间点,发送携带第二重构速度的重构速度指令,可以实现滞后调节的目的。
在存储设备的业务处理压力下降的情况下,存储设备通过滞后调节重构速度,能够保证从高业务压力平稳过渡到低业务压力,保证现有负载下的IO请求被处理完成。
情况B、若存储设备的业务处理压力上升,即业务处理压力发生低高切换时,存储设备提前调整重构速度。具体地,若存储设备检测到在第二时间点业务处理压力发生上升,存储设备以第二时间点为基准,将第二时间点向前偏移预设时长,得到早于第二时间点的第一时间点,在第一时间点,存储设备按照第二重构速度对故障盘存储的数据进行数据重构。例如,请参见图4,第一时间点例如是t_c,第二时间点例如是t_d,预设时长例如是ΔT,从业务压力变化曲线可以看出,在时间为t_c时,存储设备的业务处理压力上升,在这一情况下,提前了ΔT,在时间为t_d时,存储设备的重构速度就已经开始下降。可选地,调整重构速度的过程通过发送重构速度指令的方式实现,存储设备通过在压力上升时间点之后的时间点,发送携带第二重构速度的重构速度指令,可以实现提前调节的目的。
在情况二下,在存储设备的业务处理压力将要上升时,通过提前降低重构速度,能够保证业务压力上升时,重构速度已经被降低到合理的值,从而避免高业务压力下,仍以高重构速度进行数据重构时会造成的业务阻塞问题。
应理解,滞后调节的范围是预设时长、提前调节的范围是预设时长仅是举例说明,在另一些可能的实施例中,滞后调节的范围或提前调节的范围不是预设时长,而是根据业务处理压力确定。例如,业务处理压力的上升幅度越大,则提前调节的范围越大,业务处理压力的 下降幅度越大,则滞后调节的范围越大。
综上,上述实施例提供了一种基于业务压力的重构速度动态调整策略。可选地,该动态调整策略应用在其他的与业务压力相关的任务场景中,比如应用在GC任务或数据复制任务中。例如,根据业务压力信息,对当前的GC速度进行调整,得到目标GC速度,该目标GC速度和业务处理压力负相关,那么当存储设备的业务处理压力小时,GC速度提速,当存储设备的业务处理压力大时,GC速度减速,从而基于业务压力实现GC速度的动态调整,减少执行GC任务对业务压力的影响。又如,根据业务压力信息,对当前的数据复制速度进行调整,得到目标数据复制速度,该目标数据复制速度和业务处理压力负相关,那么当存储设备的业务处理压力小时,数据复制任务提速,当存储设备的业务处理压力大时,数据复制任务减速,从而基于业务压力实现GC速度的动态调整,减少执行数据复制任务对业务压力的影响。
上述方法实施例可以通过存储设备的不同模块协同工作实现。例如,请参见图5,本实施例提供了一种数据重构系统实现,该数据重构系统的逻辑功能架构如图5所示,该数据重构系统包括多个软件功能模块,例如压力预测模块501、资源刻画模块502、系统调度模块503、性能评估模块504、步长计算模块505、重构控制模块506和重构计算模块507。其中,系统调度模块503也可称为服务质量(Quality of Service,QoS)模块。可选地,压力预测模块501和资源刻画模块502位于控制器之外的其他处理器中,例如,压力预测模块501和资源刻画模块502位于GPU中。而系统调度模块503、性能评估模块504、步长计算模块505、重构控制模块506和重构计算模块507位于控制器中。可选地,压力预测模块501、资源刻画模块502、系统调度模块503、性能评估模块504、步长计算模块505、重构控制模块506和重构计算模块507均为控制器中。这些功能模块在重构整体计算流程包括S311至S318,S311至S318是对上述S301至S304的举例说明。
S311、控制器收集历史业务压力信息,保存到硬盘。其中,历史业务压力信息例如是设备资源数据:CPU利用率等。
S312、控制器确定硬盘损坏。
S313、控制器从保存历史业务压力信息的硬盘中,读取历史业务压力信息,将历史业务压力信息发送到压力预测模块501。
S314、压力预测模块501预测未来的业务压力,得到业务压力信息,将业务压力信息发送到资源刻画模型模块,资源刻画模块502根据业务压力信息,计算重构速度的推荐值,即第一重构速度,将不同时刻的第一重构速度发送给系统调度模块503(QoS)。其中,压力预测和资源模型刻画的相关运算可以进行算力卸载,比如配合GPU或者其他外挂设备进行。
S315、系统调度模块503将数据重构速度设定为第一重构速度,将第一重构速度发送给重构控制模块506,通知重构控制模块506按照第一重构速度进行数据重构。
S316、重构控制模块506读取重构依赖数据(重构依赖数据即数据重构过程的输入数据),重构控制模块506将读取的数据放入重构计算模块507,完成之后,重构控制模块506保存数据和修改元数据。
S317、控制器采集设备的性能指标,根据性能指标的当前值,判断当前的重构速度与最优的重构速度的差距。
S318、步长计算模块505计算调整步长,将调整步长发送给系统调度模块503,系统调度模块503更新重构速度。
在数据重构的过程中,S316至S318可以重复执行,直到重构完成。
参见图6,其示出了动态反馈的调整流程图,动态反馈的调整流程包括S321至S325。
S321、开始重构。
S322、重构速度设定为对应时刻的重构速度。具体地,根据业务压力预测值计算不同时刻的重构速度初始值,每个时刻以对应的初始值开始重构。
S323、判断以当前值进行重构对系统的影响。具体地,计算|t 0-t 1|,对|t 0-t 1|与e进行比较,若|t 0-t 1|<e,执行S324,若|t 0-t 1|>e,执行S325。若|t 0-t 1|=e,执行S324或执行S325。
S324、重构速度的增量为sgn(0),即重构速度增量为零,即保持重构速度不变,返回S322。
S325、基于业务压力得到重构速度的调整步长,重构速度增量为sgn(t 0-t 1)*△y,返回S322。
其中,S324至S325为通过y=y 0+sgn(t 0-t 1)*△y调整重构速度的过程,在数据重构的过程中,通过重复S322至S325,直至满足|t 0-t 1|<e这个条件。
本实施例提供了一种基于业务处理压力动态调整重构速度的方法,通过根据存储设备的业务压力信息,来对存储设备当前的数据重构速度进行调整,按照调整后的重构速度进行数据重构,当存储设备处于业务空闲状态时,存储设备的业务处理压力小,则数据重构会提速,从而充分地利用空闲资源进行数据重构,提高存储设备的资源利用率,节约重构花费的时间,提高设备可靠性。而当存储设备处于业务繁忙状态,存储设备的业务处理压力大,则数据重构会减速,从而避免存储设备的重构过程占用过多的资源,从而减少数据重构的过程对存储设备的业务处理性能的影响,避免存储设备的业务阻塞。因此,该方法有助于存储设备在重构速度和业务处理性能这两方面之间取得平衡。
实施例一介绍了一种基于业务处理压力动态调整重构速度的方法。实施例一中的每个步骤的执行主体可以是存储设备中的任意硬件,换句话说,本申请对实施例一中每个步骤由存储设备的哪个硬件执行不做限定。
以下通过实施例二,结合存储设备包括的硬件对实施例一进行举例说明。在下述实施例二中,存储设备包括多个处理器,S301至S303、S304分别由存储设备的不同处理器执行,换句话说,存储设备的不同处理器分别分担预测业务处理压力的任务和数据重构的任务,从而减轻负责执行数据重构的任务的处理器的压力。
实施例二
参见图7,图7是本申请实施例提供的一种数据重构方法的流程图,该方法应用于存储设备,该存储设备包括第一处理器、第二处理器和一个或多个硬盘。
其中,第一处理器和第二处理器可以是任意不同的处理器,第一处理器用于承担S3001至S3003对应的处理工作,第二处理器用于承担S3004对应的处理工作。
例如,第一处理器是GPU或嵌入式神经网络处理器(neural-network processing units,NPU)、CPU,或者,第一处理器也可以是集成电路。例如,第一处理器可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。第一处理器可以是单核处理器,也可以是多核处理器。
第二处理器例如是CPU、网络处理器(network processer,NP)、微处理器、或者可以是 一个或多个用于实现本申请方案的集成电路,例如,ASIC、PLD或其组合。上述PLD可以是CPLD,FPGA,GAL或其任意组合。第二处理器可以是单核处理器,也可以是多核处理器。
在一种可能的实现中,第一处理器是GPU且第二处理器是CPU。
示例性地,实施例二包括以下S401至S405。其中,S401与S301同理,S402与S302同理,S403与S303同理,S405与S304同理。
S401、第一处理器获取存储设备的业务压力信息。
S402、第一处理器确定存储设备在第一重构速度下的性能指标满足预设条件。
S403、第一处理器根据业务压力信息,对第一重构速度进行调整,得到第二重构速度。
S404、第一处理器向第二处理器发送第二重构速度。
S405、第二处理器从第一处理器接收第二重构速度,第二处理器按照第二重构速度,对存储设备中故障盘存储的数据进行数据重构。
本实施例提供的方法,通过第一处理器执行获取业务压力信息的任务以及确定调整后的重构速度的任务,通过第二处理器根据第一处理器得出的重构速度来进行数据重构。由于通过第一处理器和第二处理器分担预测业务处理压力的任务、计算重构速度的任务以及数据重构的任务,从而将预测业务处理压力的任务、计算重构速度的任务卸载至第一处理器,从而减轻了第二处理器的处理压力,节省第二处理器的开销,使得第二处理器能够留出更多算力执行其他任务,从而提升了第二处理器的算力,因此有助于提升第二处理器的性能。
上述实施例二将S301至S303卸载至存储设备内部的其他处理器执行。可选地,将S301至S303卸载至云端设备执行。其中,云端设备和存储设备通过网络通信。该云端设备例如是主机、服务器、个人电脑或其他具有计算处理能力的设备。
以下通过实施例三,描述通过云端设备承担S301至S303的工作时数据重构的流程。换句话说,实施例三关于存储设备如何通过和云端设备交互实现基于业务处理压力动态调整重构速度。
在一个示例性应用场景中,实施例三应用在分布式存储系统中,云端设备和存储设备是同一个分布式存储系统中的不同节点设备,比如,存储设备是分布式存储系统中的存储节点,云端设备是分布式存储系统中的计算节点,例如云端设备是分布式存储系统中的存储客户端。例如,参见图2,实施例三中的云端设备是系统架构200中的服务器205或服务器206,实施例三中的存储设备是系统架构200中的服务器201、服务器202、服务器203或服务器204。
实施例三
参见图8,图8是本申请实施例提供的一种数据重构方法的流程图,该方法的交互主体包括云端设备和存储设备。示例性地,实施例三包括以下S501至S505。其中,S501与S301同理,S502与S302同理,S503与S303同理,S505与S304同理。
S501、云端设备获取业务压力信息。
S502、云端设备确定存储设备在第一重构速度下的性能指标满足预设条件。
S503、云端设备根据业务压力信息,对第一重构速度进行调整,得到第二重构速度。
S504、云端设备向存储设备发送第二重构速度。
S505、存储设备从云端设备接收第二重构速度,存储设备按照第二重构速度,对存储设 备中故障盘存储的数据进行数据重构。
本实施例提供的方法,通过云端设备执行获取业务压力信息的任务以及确定调整后的重构速度的任务,通过存储设备根据云端设备得出的重构速度来进行数据重构。由于通过云端设备和存储设备分担预测业务处理压力的任务、计算重构速度的任务以及数据重构的任务,从而将预测业务处理压力的任务、计算重构速度的任务卸载至云端设备,从而减轻了存储设备的处理压力,节省存储设备的开销,使得存储设备能够留出更多算力执行其他任务,从而提升了存储设备的算力,因此有助于提升存储设备的性能。
以上介绍了本申请实施例的数据重构方法,以下介绍本申请实施例的数据重构装置,应理解,该数据重构装置其具有上述方法中存储设备的任意功能。
图9是本申请实施例提供的一种数据重构装置的结构示意图,如图9所示,数据重构装置900包括:获取模块901,用于执行S301;调整模块902,用于执行S303;数据重构模块903,用于执行S304。可选地,数据重构装置900还包括确定模块,用于执行S302。
应理解,数据重构装置900对应于上述实施例一、实施例二或实施例三中的存储设备,数据重构装置900中的各模块和上述其他操作和/或功能分别为了实现上述实施例一、实施例二或实施例三中的存储设备所实施的各种步骤和方法,具体细节可参见上述实施例一、实施例二或实施例三,为了简洁,在此不再赘述。
应理解,数据重构装置900在数据重构时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将数据重构装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据重构装置与上述实施例一、实施例二或实施例三属于同一构思,其具体实现过程详见上述实施例一、实施例二或实施例三,这里不再赘述。
以上介绍了本申请实施例的存储设备,以下介绍存储设备可能的产品形态。
应理解,但凡具备上述存储设备的特征的任何形态的产品,都落入本申请的保护范围。还应理解,以下介绍仅为举例,不限制本申请实施例的存储设备的产品形态仅限于此。
参见图10,图10是本申请实施例提供的一种存储设备的结构示意图,存储设备1000包括第一处理器1001、第二处理器1011、通信总线1002、存储器1003以及至少一个通信接口1004以及一个或多个硬盘。该一个或多个硬盘例如包括硬盘102、硬盘103、硬盘104和硬盘105。
第一处理器1001,用于执行S401至S404。
第二处理器1011,用于执行S405。
通信总线1002用于在存储设备1000中的不同组件之间传送信息。通信总线1002可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信总线1002包括而不限于高速串行计算机扩展总线标准(peripheral component interconnect express,简称:PCIe)总线、内存结构(memory fabric)、光纤信道(Fibre Channel,FC)、小型计算机系统接口(SCSI,Small Computer System Interface)、以太网等等。
存储器1003可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的 其它类型的静态存储设备1000,也可以是随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备1000,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only Memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备1000,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器1003可以是独立存在,并通过通信总线1002与处理器1001相连接。存储器1003也可以和处理器1001集成在一起。
在一些实施例中,存储器1003用于存储执行本申请方案的程序代码1010,处理器1001可以执行存储器1003中存储的程序代码1010。也即是,存储设备1000可以通过处理器1001以及存储器1003中的程序代码1010,来实现方法实施例提供的数据重构方法。
通信接口1004使用任何收发器一类的装置,用于与其它设备或通信网络通信。通信接口1004包括有线通信接口,还可以包括无线通信接口。其中,有线通信接口例如可以为以太网接口。以太网接口可以是光接口,电接口或其组合。无线通信接口可以为无线局域网(wireless local area networks,WLAN)接口,蜂窝网络通信接口或其组合等。收发器用于与其它设备或通信网络通信,网络通信的方式可以而不限于是以太网,无线接入网(RAN),无线局域网(wireless local area networks,WLAN)等。
作为一种实施例,存储设备1000还可以包括输出设备1006和输入设备1007。输出设备1006和处理器1001通信,可以以多种方式来显示信息。例如,输出设备1006可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备1007和处理器1001通信,可以以多种方式接收用户的输入。例如,输入设备1007可以是鼠标、键盘、触摸屏设备或传感设备等。
应理解,第一处理器1001和第二处理器1011分离设置仅是示例,在另一些实施例中,第一处理器1001和第二处理器1011集成在一起,第一处理器1001和第二处理器1011是存储设备1000的同一个处理器,该处理器执行S301至S304。例如,存储设备由同一个处理器执行预测业务处理压力的任务以及数据重构的任务。
应理解,上述各种产品形态的存储设备,分别具有上述方法实施例中存储设备的任意功能,此处不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例中描述的各方法步骤和单元,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参见前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例中方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上描述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机程序指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机程序指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD)、或者半导体介质(例如固态硬盘)等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上描述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (11)

  1. 一种数据重构方法,其特征在于,所述方法应用于存储设备中,所述方法包括:
    获取业务压力信息,所述业务压力信息用于指示所述存储设备的业务处理压力;
    根据所述业务压力信息,对第一重构速度进行调整,得到第二重构速度,所述第一重构速度为所述存储设备当前的数据重构速度,所述第二重构速度和所述业务处理压力负相关;
    按照所述第二重构速度,对所述存储设备中故障盘存储的数据进行数据重构。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述业务压力信息,对第一重构速度进行调整,得到第二重构速度,包括:
    根据所述业务压力信息,获取调整步长,所述调整步长和所述业务处理压力负相关;
    根据所述调整步长以及所述第一重构速度,获取所述第二重构速度,所述第二重构速度为所述第一重构速度和所述调整步长的和值。
  3. 根据权利要求1所述的方法,其特征在于,所述按照所述第二重构速度,对所述存储设备中故障盘存储的数据进行数据重构,包括:
    在第一时间点,按照所述第二重构速度对所述故障盘存储的数据进行数据重构,所述第一时间点是以第二时间点为基准偏移预设时长后得到的时间点,所述第二时间点为所述业务处理压力发生变化的时间点。
  4. 根据权利要求3所述的方法,其特征在于,所述第二时间点为所述业务处理压力发生下降的时间点,所述第一时间点晚于所述第二时间点;或,
    所述第二时间点为所述业务处理压力发生上升的时间点,所述第一时间点早于所述第二时间点。
  5. 根据权利要求1所述的方法,其特征在于,所述获取业务压力信息,包括:
    将历史业务压力信息输入预测模型,所述历史业务压力信息用于指示所述存储设备在历史时间点的业务处理压力;
    通过所述预测模型对所述历史业务压力信息进行处理,输出所述业务压力信息。
  6. 一种存储设备,其特征在于,包括第一处理器、第二处理器和一个或多个硬盘;
    所述第一处理器,用于获取业务压力信息,所述业务压力信息用于指示所述存储设备的业务处理压力;根据所述业务压力信息,对第一重构速度进行调整,得到第二重构速度,所述第一重构速度为所述存储设备当前的数据重构速度,所述第二重构速度和所述业务处理压力负相关;
    所述第二处理器,用于对所述一个或多个硬盘中的故障盘存储的数据进行数据重构。
  7. 根据权利要求6所述的存储设备,其特征在于,所述第一处理器,用于根据所述业务 压力信息,获取调整步长,所述调整步长和所述业务处理压力负相关;根据所述调整步长以及所述第一重构速度,获取所述第二重构速度,所述第二重构速度为所述第一重构速度和所述调整步长的和值。
  8. 根据权利要求6所述的存储设备,其特征在于,所述第一处理器,用于在第一时间点,按照所述第二重构速度对所述故障盘存储的数据进行数据重构,所述第一时间点是以第二时间点为基准偏移预设时长后得到的时间点,所述第二时间点为所述业务处理压力发生变化的时间点。
  9. 根据权利要求8所述的存储设备,其特征在于,所述第二时间点为所述业务处理压力发生下降的时间点,所述第一时间点晚于所述第二时间点;或,所述第二时间点为所述业务处理压力发生上升的时间点,所述第一时间点早于所述第二时间点。
  10. 根据权利要求6所述的存储设备,其特征在于,所述第一处理器,用于将历史业务压力信息输入预测模型,所述历史业务压力信息用于指示所述存储设备在历史时间点的业务处理压力;通过所述预测模型对所述历史业务压力信息进行处理,输出所述业务压力信息。
  11. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器读取以使存储设备执行如权利要求1至权利要求5中任一项所述的方法。
PCT/CN2020/111144 2020-02-10 2020-08-25 数据重构方法、存储设备及存储介质 WO2021159687A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010085179.0A CN113254256B (zh) 2020-02-10 2020-02-10 数据重构方法、存储设备及存储介质
CN202010085179.0 2020-02-10

Publications (1)

Publication Number Publication Date
WO2021159687A1 true WO2021159687A1 (zh) 2021-08-19

Family

ID=77219644

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111144 WO2021159687A1 (zh) 2020-02-10 2020-08-25 数据重构方法、存储设备及存储介质

Country Status (2)

Country Link
CN (1) CN113254256B (zh)
WO (1) WO2021159687A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637466A (zh) * 2022-03-03 2022-06-17 深圳大学 一种数据读写行为推测方法、装置、存储介质及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117608502B (zh) * 2024-01-24 2024-07-02 济南浪潮数据技术有限公司 分布式存储系统的数据重构管理方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140009766A (ko) * 2012-07-13 2014-01-23 네이버비즈니스플랫폼 주식회사 데이터베이스 복구 속도 조절 방법 및 장치
CN107391317A (zh) * 2017-09-14 2017-11-24 郑州云海信息技术有限公司 一种数据恢复的方法、装置、设备及计算机可读存储介质
CN109117306A (zh) * 2018-07-24 2019-01-01 广东浪潮大数据研究有限公司 一种基于对象读写时延调整数据恢复速度的方法及装置
CN109144782A (zh) * 2018-08-22 2019-01-04 郑州云海信息技术有限公司 一种数据恢复方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006285803A (ja) * 2005-04-04 2006-10-19 Sony Corp データ記憶装置、再構築制御装置、再構築制御方法、プログラム及び記憶媒体
CN106776044B (zh) * 2017-01-11 2020-02-04 深圳鲲云信息科技有限公司 基于数据流的硬件加速方法及系统
CN107729200A (zh) * 2017-10-20 2018-02-23 郑州云海信息技术有限公司 一种存储系统性能的测试方法及相关装置
CN110413454B (zh) * 2018-04-28 2022-04-05 华为技术有限公司 基于存储阵列的数据重建方法、装置及存储介质
CN109359019A (zh) * 2018-08-15 2019-02-19 中国平安人寿保险股份有限公司 应用程序性能监控方法、装置、电子设备及存储介质
CN110109628B (zh) * 2019-05-20 2022-08-09 深信服科技股份有限公司 分布式存储系统的数据重建方法、装置、设备及存储介质
CN110515917B (zh) * 2019-08-09 2022-12-02 苏州浪潮智能科技有限公司 一种控制重构速度的方法、装置及介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140009766A (ko) * 2012-07-13 2014-01-23 네이버비즈니스플랫폼 주식회사 데이터베이스 복구 속도 조절 방법 및 장치
CN107391317A (zh) * 2017-09-14 2017-11-24 郑州云海信息技术有限公司 一种数据恢复的方法、装置、设备及计算机可读存储介质
CN109117306A (zh) * 2018-07-24 2019-01-01 广东浪潮大数据研究有限公司 一种基于对象读写时延调整数据恢复速度的方法及装置
CN109144782A (zh) * 2018-08-22 2019-01-04 郑州云海信息技术有限公司 一种数据恢复方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637466A (zh) * 2022-03-03 2022-06-17 深圳大学 一种数据读写行为推测方法、装置、存储介质及电子设备
CN114637466B (zh) * 2022-03-03 2022-11-11 深圳大学 一种数据读写行为推测方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN113254256B (zh) 2023-08-22
CN113254256A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
WO2021008285A1 (zh) 分布式系统的数据同步方法、装置、介质、电子设备
CN109284068B (zh) 数据存储管理系统、方法及装置
US11256595B2 (en) Predictive storage management system
US9485160B1 (en) System for optimization of input/output from a storage array
US9477295B2 (en) Non-volatile memory express (NVMe) device power management
JP5218390B2 (ja) 自律制御サーバ、仮想サーバの制御方法及びプログラム
US20110107053A1 (en) Allocating Storage Memory Based on Future Use Estimates
US10809936B1 (en) Utilizing machine learning to detect events impacting performance of workloads running on storage systems
US9658778B2 (en) Method and system for monitoring and analyzing quality of service in a metro-cluster
EP2515233A1 (en) Detecting and diagnosing misbehaving applications in virtualized computing systems
US9342245B2 (en) Method and system for allocating a resource of a storage device to a storage optimization operation
JP5938965B2 (ja) マルチノードストレージシステムのノード装置および処理速度管理方法
WO2021159687A1 (zh) 数据重构方法、存储设备及存储介质
US8024542B1 (en) Allocating background workflows in a data storage system using historical data
EP2981920B1 (en) Detection of user behavior using time series modeling
WO2017054690A1 (zh) 一种慢盘检测方法和装置
US20190213069A1 (en) Dynamically restoring disks based on array properties
US9542103B2 (en) Method and system for monitoring and analyzing quality of service in a storage system
US20180121237A1 (en) Life cycle management of virtualized storage performance
US20170017575A1 (en) Apparatus and Method of Performing Agentless Remote IO Caching Analysis, Prediction, Automation, and Recommendation in a Computer Environment
US20140372353A1 (en) Information processing system and data update control method
US20210294497A1 (en) Storage system and method for analyzing storage system
CN115168042A (zh) 监控集群的管理方法及装置、计算机存储介质、电子设备
US11507325B2 (en) Storage apparatus and method for management process
US12019532B2 (en) Distributed file system performance optimization for path-level settings using machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919112

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919112

Country of ref document: EP

Kind code of ref document: A1