WO2023029680A1 - 一种确定磁盘可使用时长的方法及装置 - Google Patents

一种确定磁盘可使用时长的方法及装置 Download PDF

Info

Publication number
WO2023029680A1
WO2023029680A1 PCT/CN2022/100508 CN2022100508W WO2023029680A1 WO 2023029680 A1 WO2023029680 A1 WO 2023029680A1 CN 2022100508 W CN2022100508 W CN 2022100508W WO 2023029680 A1 WO2023029680 A1 WO 2023029680A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
usage
period
predicted
fluctuation value
Prior art date
Application number
PCT/CN2022/100508
Other languages
English (en)
French (fr)
Inventor
邱文
卢道和
罗锶
曾可
关俊
姚正杰
谢军
陈楚曦
朱国雄
万亿兵
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023029680A1 publication Critical patent/WO2023029680A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device

Definitions

  • Embodiments of the present invention relate to the field of financial technology (Fintech), and in particular to a method and device for determining the usable time of a disk.
  • Fetech financial technology
  • the average daily usage increment is calculated by counting the disk usage within 30 days, and the usable time of the disk is predicted based on the current remaining capacity of the disk and the average daily usage increment.
  • the disk usage is predicted by removing a minimum value, removing a maximum value, and removing a maximum value and a minimum value at the same time. Available time.
  • Embodiments of the present invention provide a method and device for determining the usable time of a disk, so as to effectively improve the accuracy of determining the usable time of a disk.
  • the embodiment of the present invention provides a method for determining the usable time of a disk, including:
  • the forecast period is the i-th period
  • the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is greater than the first set threshold, then use the i+1th period as the prediction period, and return to perform acquisition
  • the disk usage of the disk to be predicted in each period of the sliding window before the prediction period, until the jth return execution, the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period The amount difference is less than or equal to the first set threshold, so that it is determined that the usable duration of the i-1th time period of the disk to be predicted is j time periods.
  • the disk usage prediction function is constructed according to the disk usage of the disk to be predicted in each period of the sliding window before the prediction period (i.e., the i-th period), and the disk usage prediction function is used to determine the to-be-predicted Disk The forecasted disk usage for the forecast period.
  • the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is greater than the first set threshold, use the i+1th period as the prediction period, and return to the execution to obtain the disk to be predicted in the forecast
  • the disk usage of each period in the sliding window before the time period is equivalent to executing the acquisition of the disk usage of the disk to be predicted in the sliding window of each period before the i+1th time period, so that the disk usage of the disk to be predicted can be determined cyclically.
  • the predicted disk usage of each prediction period so that after returning to execute j times, the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is less than or equal to the first set threshold, Then, the loop can be exited, so that the usable duration of the disk to be predicted at the i-1th time period can be dynamically and accurately calculated as j time periods.
  • the program can make the disk usage prediction function constructed in each cycle more realistic, and more in line with the disk usage of each time period in this sliding window The real situation reflected, so that the usable time of the disk to be predicted at the i-1th time period can be determined more truly and accurately, and the accuracy of determining the usable time of the disk can be effectively improved.
  • the program since the program determines the predicted disk usage of the disk to be predicted in each prediction period through cyclic execution, it can automatically calculate the usable time of the disk to be predicted in the period i-1, so that excessive manual intervention can be avoided , and help to reduce the time and manpower spent on manually determining the usable time of the disk to be predicted, so as to improve the calculation efficiency of the usable time of the disk to be predicted in the i-1th time period.
  • the constructing a disk usage prediction function according to the disk usage in each time period within the sliding window includes:
  • n function parameter groups corresponding to the prediction function construction mode; based on the n function parameter groups, sequentially execute the prediction function construction mode, and determine the n function parameter groups respective corresponding loss function values, and comparing the loss function values corresponding to each of the n function parameter groups, determining the minimum loss function value, and determining the function parameter group corresponding to the minimum loss function value;
  • the disk usage prediction function is constructed.
  • n function parameter groups corresponding to the prediction function construction mode are generated, and the prediction function construction mode is sequentially executed through the n function parameter groups , the corresponding loss function values of the n function parameter groups can be determined, and the corresponding loss function values of the n function parameter groups under the prediction function construction mode can be compared to determine the minimum loss function value and The function parameter group corresponding to the minimum loss function value.
  • the minimum loss function values corresponding to the m prediction function construction modes can be compared, namely The smallest minimum loss function value can be accurately determined, and the function parameter set corresponding to the smallest minimum loss function value can be determined, and the function parameter set corresponding to the smallest minimum loss function value can be used as the target function parameter set. Then, the disk usage prediction function can be accurately constructed according to the objective function parameter group. In this way, the scheme can make the constructed disk usage prediction function more realistic, more realistic and accurate by setting m kinds of prediction function construction modes, so that the predicted disk usage of the disk in a certain prediction period can be determined more accurately .
  • executing the prediction function construction mode sequentially to determine the respective loss function values corresponding to the n function parameter groups includes:
  • the loss function value corresponding to the function parameter set is determined through the predicted disk usage corresponding to each time period and the real disk usage corresponding to each time period.
  • the function corresponding to the function parameter group can be constructed, thus, Input each time period in the sliding window into this function to automatically calculate the predicted disk usage corresponding to each time period, and perform a difference calculation between the predicted disk usage corresponding to each time period and the real disk usage corresponding to the time period, and The calculated differences corresponding to each time period are accumulated, or the calculated difference squares corresponding to each time period can be accumulated to accurately calculate the loss function value corresponding to the function parameter group.
  • using the disk usage prediction function to determine the predicted disk usage of the disk to be predicted in the prediction period includes:
  • the solution uses the disk
  • the first disk usage of the disk determined by the usage prediction function in a certain forecast period is smoothed, so that the second disk usage will not differ too much from the normal disk usage of the disk in a normal period,
  • this can eliminate abnormal factors such as a sudden increase or decrease in disk usage for a certain period of time or for certain periods of time within the sliding window.
  • the impact on the accuracy of the determined usable time of the disk in a certain period of time, that is, the elimination of one or more data bursts or sudden decreases in the sliding window is critical for determining the usable time of the disk in a certain period of time
  • the interference caused by the length of time can help to determine the usable time of the disk in a certain period of time more realistically and accurately in the follow
  • the usable time of the disk in multiple consecutive periods is discontinuous (that is, the usable time of the disk in multiple periods shows a problem of sudden highs and lows).
  • there are multiple data noise points in the disk usage within 30 days in the prior art solution there will still be one or several data noise points that cannot be removed according to the prior art solution (because the prior art solution is to remove A minimum value, remove a maximum value, and remove a maximum and minimum value), or if one or two data noises appear in the disk usage within 30 days, the one or two data noises are removed, so that the disk is in
  • the usable time of multiple continuous periods will appear suddenly high and suddenly low, and the usable time of the presented disk in multiple continuous periods is discontinuous, but the technical solution in the present invention predicts the disk usage Smoothing the disk usage of the disk determined by the function in a certain forecast period can solve the problem in the prior art that the usable duration of the disk in multiple consecutive periods is not continuous.
  • the smoothing the first disk usage to determine the second disk usage includes:
  • the second disk usage is determined.
  • the technical solution in the present invention configures different smoothing coefficients for different usage fluctuation value intervals, so that each usage fluctuation value contained in each usage fluctuation value interval can be adjusted according to the usage amount.
  • the smoothing coefficient corresponding to the fluctuation value interval performs smoothing processing on the first disk usage corresponding to each usage fluctuation value, so that the smoothing coefficient corresponding to each usage fluctuation value interval can better meet the actual needs of the usage fluctuation value interval, There is also more flexibility.
  • the smoothing coefficient can also be adaptively adjusted accordingly, so that the usable time of the disk in this period can be Changes can be smoother.
  • the smoothing coefficient records are determined in the following manner:
  • the corresponding first smoothing coefficient and the unadjusted first smoothing coefficient corresponding to other usage fluctuation value intervals return to determine the usable duration of the disk in multiple consecutive periods until any adjacent
  • the duration fluctuation values of the usable duration of the period are all less than or equal to the second set threshold, so as to determine the second smoothing coefficients corresponding to the intervals of the usage fluctuation values;
  • the second smoothing coefficients corresponding to the intervals of the fluctuation value of the usage amount are stored in the smoothing coefficient record.
  • the technical solution in the present invention sets corresponding first smoothing coefficients for each usage fluctuation value range and sets a preset step size for adjusting the first smoothing coefficients.
  • the process of determining the usable time of the disk in multiple consecutive periods is cyclically executed, so as to continuously update the first smoothing coefficient corresponding to each usage fluctuation value interval, until multiple continuous periods
  • the duration fluctuation value of the usable duration of any adjacent period is less than or equal to the second set threshold value, so the second smoothing coefficient corresponding to each usage fluctuation value interval determined in this way can better meet the actual needs of each usage fluctuation value interval , and at the same time, it can also make it possible to use the smoothing coefficients corresponding to different usage fluctuation value intervals for the corresponding smoothing process if different usage fluctuation value intervals are involved in the process of determining the usable time of the disk in a certain period of time , so the flexibility is also higher, and it is convenient for each smoothing coefficient to adaptively adjust the corresponding first disk usage, so that the usable duration of the disk in a certain period of time can be changed more smoothly.
  • the first smoothing coefficient corresponding to at least one usage fluctuation value interval is adjusted based on the preset step size, and based on the adjusted first smoothing coefficient corresponding to at least one usage fluctuation value interval and the unadjusted
  • the first smoothing coefficient corresponding to other usage fluctuation value intervals of returning to the execution to determine the usable duration of the disk in multiple consecutive periods, including:
  • the first smoothing coefficient corresponding to any usage fluctuation value interval based on the preset step size, for example, adjusting the first smoothing coefficient corresponding to the first usage fluctuation value interval, if based on the adjusted first
  • the first smoothing coefficient corresponding to the usage fluctuation value interval and the first smoothing coefficient corresponding to other usage fluctuation value intervals in each usage fluctuation value interval except the first usage fluctuation value interval In the process of the usable duration of the continuous period, if the duration fluctuation value of the usable duration of any adjacent period in multiple consecutive periods is less than or equal to the second set threshold, then the adjusted first usage fluctuation value can be
  • the first smoothing coefficient corresponding to the interval and the first smoothing coefficient corresponding to the other usage fluctuation value intervals in each usage fluctuation value interval except the first usage fluctuation value interval are used as the final corresponding to each usage fluctuation value interval smoothing factor.
  • the first smoothing coefficients corresponding to at least two usage fluctuation value intervals in each usage fluctuation value interval based on the preset step size, such as adjusting the first smoothing coefficient corresponding to the first usage fluctuation value interval and adjusting The first smoothing coefficient corresponding to the second usage fluctuation value interval, if based on the adjusted first smoothing coefficient corresponding to the first usage fluctuation value interval, the adjusted first smoothing coefficient corresponding to the second usage fluctuation value interval and The first smoothing coefficients corresponding to other usage fluctuation value intervals in each usage fluctuation value interval except the first usage fluctuation value interval and the second usage fluctuation value interval are determined when the disk is used in multiple consecutive time periods.
  • the adjusted first usage fluctuation value interval corresponding to The first smoothing coefficient, the first smoothing coefficient corresponding to the adjusted second usage fluctuation value interval, and the usage amount fluctuation value intervals other than the first usage fluctuation value interval and the second usage fluctuation value interval are used as the final smoothing coefficients corresponding to each usage fluctuation value interval.
  • the embodiment of the present invention also provides a device for determining the usable time of a disk, including:
  • An acquisition unit configured to acquire the disk usage of the disk to be predicted in each period of the sliding window before the forecast period; the forecast period is the i-th period;
  • a processing unit configured to construct a disk usage prediction function according to the disk usage in each time period within the sliding window; and determine the predicted disk usage of the disk to be predicted in the prediction period by using the disk usage prediction function amount; if the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is greater than the first set threshold, then the i+1th period is used as the prediction period, and returns Execute to obtain the disk usage of the disk to be predicted in each period of the sliding window before the prediction period, until returning to the jth execution, the remaining capacity of the disk to be predicted before the prediction period and the prediction of the prediction period
  • the difference of disk usage is less than or equal to the first set threshold, so that it is determined that the usable duration of the i-1th time period of the to-be-predicted disk is j time periods.
  • an embodiment of the present invention provides a computing device, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processing The computer executes the method for determining the usable time of the disk described in any of the first aspects above.
  • an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes the above-mentioned first The method for determining the usable time of a disk described in any aspect.
  • FIG. 1 is a schematic diagram of a possible system architecture provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for determining the usable time of a disk provided by an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a device for determining the usable time of a disk provided by an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • the system architecture for determining the usable time of a disk may include a monitoring device 100 , a data processing system 200 and at least one computer device (such as a computer device 301 , a computer device 302 , and a computer device 303 , etc.).
  • the monitoring device 100 and the data processing system 200 can be connected by wire or wirelessly; the monitoring device 100 and each computer device can be connected by wire or wirelessly. Make a communication connection.
  • the monitoring device 100 is used to monitor the disk of each computer device in real time, and obtain the disk usage of the disk of each computer device in real time. Then, the monitoring device 100 can periodically (such as every 5 minutes, 10 minutes or 30 minutes, etc., every 1h, 2h or 5h, etc., or every day, every 2 days or every 5 days, etc.)
  • the disk usage data of the disks of the computer devices are sent to the data processing system 200, and the data processing system 200 stores the received disk usage data of the disks of each computer device. Or, it can also be sent according to the data acquisition request sent by the data processing system 200.
  • the data acquisition request is used to instruct to acquire the disk usage data of the disks of one or several computer devices, and the monitoring device 100 will send one or more The disk usage data of the disks of certain computer devices is sent to the data processing system 200 .
  • the data processing system 200 may also specify to acquire the disk usage data of the disks of one or several computer devices in a certain period or multiple periods (such as 5 days, 10 days or 20 days, etc.).
  • the data processing system 200 may be a single server (such as an independent physical server), or a server cluster or a distributed system composed of multiple physical servers.
  • a certain user sends a query to the data processing system 200 for one or several The request for the usable time of the disk of the computer device in a certain period of time, such as querying the usable time of the disk of the computer device 301 in a certain period of time. Then, after receiving the query request sent by the user, the data processing system 200 can obtain the disk usage data of the disks of the computer equipment 301 in the predicted period (for example, the latest period after the current period) from the disk usage data of the disks of each computer equipment stored locally.
  • the forecast period for example, the current period is xx, 1st, xxxx, then the forecast period can be the disk usage data of each period in the sliding window before xxxx, xx, 2nd), and for the disk of the computer device 301 in the forecast period
  • the disk usage data of each period in the previous sliding window is processed to determine the usable time of the disk of the computer device 301 in a certain period, and then the usable time of the disk of the computer device 301 in a certain period is sent to the customer where the user is located terminal for display.
  • the monitoring device 100 may send a query to obtain the disk of the computer device 301 within the sliding window (such as 5 days, 10 days, 20 days or 30 days, etc.) before the prediction period.
  • the monitoring device 100 After the request for the disk usage data of the time period, after the monitoring device 100 receives the data acquisition request, it will send the disk usage data of the disk of the computer device 301 in the sliding window before the prediction time period for each time period to the data processing system 200, After the data processing system 200 receives the disk usage data of the disk of the computer device 301 in each period within the sliding window before the predicted period, for the disk usage data of the computer device 301 within the sliding window before the predicted period, the disk usage data of each period Perform processing to determine the usable time of the disk of the computer device 301 in a certain period, and then send the usable time of the disk of the computer device 301 in a certain period to the client where the user is located for display.
  • the data processing system 200 After the data processing system 200 receives the disk usage data of the disk of the computer device
  • FIG. 1 is only an example, which is not limited in this embodiment of the present invention.
  • FIG. 2 exemplarily shows the flow of a method for determining the usable time of a disk provided by an embodiment of the present invention, and the flow can be executed by an apparatus for determining the usable time of a disk.
  • the process specifically includes:
  • Step 201 acquire the disk usage of the disk to be predicted in each time period within the sliding window before the prediction time period.
  • the magnetic disk in the computer equipment (such as notebook computer or desktop computer, etc.)
  • the magnetic disk is one of the main storage media of the computer equipment, which is used to store a large amount of data of the computer equipment, and can prevent data loss, so timely Accurately determining the usage of disks and the usable time of disks plays an important role in the system operation and maintenance of computer equipment, so the usable time of disks in computer equipment is usually predicted for a certain period of time or several periods of time , so as to provide reference support for subsequent reasonable use of disks.
  • the disk usage of a certain period refers to the quantity value obtained by the sum of the occupied amount and the cleaned amount of the disk in this period, that is to say, the disk usage of a certain period can be Positive value, that is, the occupied amount of the disk in a certain period is greater than the cleaned amount of the disk in this period, and it can also be a negative value, that is, the occupied amount of the disk in a certain period is smaller than the cleaned amount of the disk in this period, for example Assuming that the occupied amount of the disk in a certain period is 2G, and the amount cleaned up in this period is 0, then the disk usage of the disk in this period is 2G, or the amount cleaned up in this period is -0.5G, then the disk The disk usage during this period is 1.5G.
  • the disk usage of the disk during this period is -1G, or the amount cleaned during this period is -4G , then the disk usage of the disk during this period is -2G.
  • the disk usage data of the disk in a certain period or multiple periods can be obtained through a monitoring component or monitoring equipment.
  • a monitoring component can be set in a computer device to monitor the disk in the computer device, and the disk can be obtained in real time.
  • disk usage data, or a separate monitoring device can be set outside the computing device, the monitoring device can be used to monitor the disks of multiple computer devices, and can obtain the disk usage data of the disks of the multiple computer devices in real time. Then, the monitoring component can report the disk usage data of the disks in a computer device acquired in real time to the data processing system, or the monitoring device can report the disk usage data of the disks of multiple computer devices acquired in real time to the data processing system.
  • the processing system after the data processing system obtains the disk usage data of the disks of one or more computer devices, it can automatically target the disks of one or more computer devices within the sliding window before the prediction period (such as the i-th period) ( For example, 5 days, 10 days, 20 days or 30 days, etc.) the disk usage data of each period is processed to determine the usable time of the disk of the one or more computer devices in the i-1 period, or it can also be Based on a request sent by a user through the client to inquire about the usable time of the disks of one or more computer devices, the usable time of the disks of the one or more computer devices is determined in the i-1th time period.
  • the prediction period such as the i-th period
  • the device for determining the usable time of the disk can be set in the data processing system, as a functional part of the data processing system, the method for determining the usable time of the disk can be executed by the device for determining the usable time of the disk, or can be set Executed on a chip or an integrated circuit within a device that determines how long a disk can be used.
  • the data processing system can automatically calculate the usable time of the disk of a certain computer device (such as computer device A) in a certain period (such as the current period, for example, the current period is xx, 5, xxxx), where Then you can obtain the disk usage data of the computer device A in the sliding window (such as 30 days) before xx.
  • the data processing system is based on the availability of the disk of a certain computer device (such as computer device A) in a certain period of time (such as the current period, for example, the current period is xx month 5, xxxx) sent by a user through the client.
  • the time query request is used to calculate the usable time of the computer device’s disk in a certain period of time.
  • the data processing system can obtain the computer device A from the local storage device before xx.
  • the disk usage data of each period within the sliding window (for example, 30 days), or the data processing system can send to the monitoring device to obtain the disk usage data of computer device A within the sliding window of each period before xx, 6, xxxx A request, which is used to obtain the disk usage data of the computer device A at each time period within the sliding window before xx, 6, xxxx, from the monitoring device.
  • Step 202 constructing a disk usage prediction function according to the disk usage in each time period within the sliding window.
  • the disk usage prediction for predicting the disk usage of the disk to be predicted in the prediction period (such as the i-th period) can be fitted.
  • n function parameter groups corresponding to the prediction function construction mode are generated. Then, based on each function parameter group in the n function parameter groups, execute the prediction function construction mode corresponding to the n function parameter groups in sequence, and then determine the respective losses corresponding to the n function parameter groups in the prediction function construction mode function value, and compare the loss function values corresponding to the n function parameter groups, the minimum loss function value can be determined, and the function parameter group corresponding to the minimum loss function value can be determined.
  • the minimum minimum loss function value can be determined, and the function parameter group corresponding to the minimum minimum loss function value can be determined, and the minimum The function parameter group corresponding to the minimum loss function value is used as the objective function parameter group, so that the disk usage prediction function can be accurately constructed.
  • the loss function values corresponding to each of the n function parameter groups for each function parameter group in the n function parameter groups, construct the function corresponding to the function parameter group, and input each time period in the sliding window to the function The function corresponding to the parameter group determines the predicted disk usage corresponding to each time period.
  • the function parameters can be accurately calculated by calculating the difference between the predicted disk usage corresponding to each period and the real disk usage corresponding to this period, and accumulating the squares of the calculated differences corresponding to each period The loss function value corresponding to the group.
  • the scheme can make the constructed disk usage prediction function more realistic, more realistic and accurate by setting m kinds of prediction function construction modes, so that the predicted disk usage of the disk in a certain prediction period can be determined more accurately .
  • set 8 predictive function construction modes and execute the process of each predictive function construction mode in sequence, for example, execute the process of the first predictive function construction mode, the process of the second predictive function construction mode, etc., and so on Calculate the optimal parameter group ⁇ corresponding to each prediction function construction mode and the optimal parameter n corresponding to the optimal parameter group ⁇ in turn, and also calculate the minimum loss error value corresponding to each prediction function construction mode in turn, and then Compare the minimum loss error values corresponding to each of the 8 prediction function construction modes to determine the minimum minimum loss error value, and determine the optimal parameter group ⁇ corresponding to the minimum minimum loss error value and the optimal parameter group ⁇
  • the disk usage prediction function can be fitted according to the optimal parameter group ⁇ corresponding to the smallest minimum loss error value and the optimal parameter n corresponding to the optimal parameter group ⁇ .
  • the embodiment of the present invention can construct the framework of the F(x) function by using the poly1d function of the python numpy library, and use the leastsq least squares function in the scipy library to realize the fitting of the disk usage prediction function.
  • the following describes the fitting of the disk usage prediction function through an exemplary execution script, namely:
  • Step 203 Determine the predicted disk usage of the disk to be predicted in the prediction period by using the disk usage prediction function.
  • the solution uses the The first disk usage determined by the disk usage prediction function in a certain forecast period is smoothed so that the second disk usage will not differ too much from the normal disk usage of the disk in a normal period , so as to ensure that the second disk usage can be within the normal disk usage range of the disk, so as to eliminate abnormalities such as sudden increase or decrease of disk usage during a certain period of time or certain periods of time within the sliding window.
  • the impact of factors on the accuracy of the determined usable time of the disk in a certain period of time can help to determine the usable time of the disk in a certain period of time more realistically and accurately in the future, and can ensure that the disk can be used in a certain period of time.
  • the continuity of the usable duration of consecutive periods Specifically, first use the disk usage prediction function to determine the first disk usage of the disk to be predicted in the i-th time period, and determine the second disk usage by smoothing the first disk usage, and then calculate the The second disk usage is determined as the predicted disk usage of the disk to be predicted in the prediction period.
  • the predicted disk usage of the disk to be predicted in the prediction period can be calculated in the following manner, namely:
  • S(t) is used to represent the predicted disk usage obtained after smoothing the first disk usage of the disk to be predicted in the tth period (that is, the prediction period); F(t) is used to represent the disk to be predicted in The first disk usage of the t-th prediction period; S(t-1) is used to represent the predicted disk usage obtained after smoothing the first disk usage of the disk to be predicted in the t-1 period; ⁇ is used for Indicates the smoothing coefficient used for smoothing the first disk usage of the disk to be predicted in period t.
  • the third disk usage of the disk to be predicted at the i-1th period is determined through the disk usage prediction function first, and the first disk usage and the first disk usage are calculated.
  • the usage fluctuation value of the three disk usage that is, the ratio of the absolute value of the difference between the first disk usage and the third disk usage to the third disk usage is used as the first disk usage and the third disk usage
  • the usage fluctuation value of the volume is used as the first disk usage and the third disk usage. The usage fluctuation value of the volume. Then determine the usage fluctuation value range where the usage fluctuation value is located in from the smoothing coefficient record, and use the smoothing coefficient corresponding to the usage fluctuation value range as the smoothing coefficient used for smoothing the first disk usage. Then, based on the smoothing coefficient and the first disk usage, the second disk usage is determined.
  • the first disk usage on the 31st day can be calculated through the disk usage prediction function, and the 30th disk usage can be calculated through the disk usage prediction function.
  • the third disk usage of the day calculate the usage fluctuation value of the first disk usage on the 31st day and the third disk usage on the 30th day, and determine the usage fluctuation value interval where the usage fluctuation value is located from the smoothing coefficient record, and at the same time
  • the smoothing coefficient ⁇ i corresponding to the range of usage fluctuation values is determined.
  • the technical solution in the present invention configures different smoothing coefficients for different usage fluctuation value intervals, so that each usage fluctuation value contained in each usage fluctuation value interval can be adjusted according to the usage amount.
  • the smoothing coefficient corresponding to the fluctuation value interval performs smoothing processing on the first disk usage corresponding to each usage fluctuation value.
  • the smoothing coefficient record may be determined in the following manner: firstly, by setting corresponding first smoothing coefficients for each usage fluctuation value interval and setting a preset step size for adjusting the first smoothing coefficient. Then based on the disk usage of the disk in multiple historical periods (such as the disk usage within a certain 30 days generated during the operation of the disk) and at least one usage fluctuation value interval (such as a usage fluctuation value interval or The first smoothing coefficient corresponding to any two usage fluctuation value intervals or three usage fluctuation value intervals, etc.) determines the usable duration of the disk in multiple consecutive time periods.
  • the duration fluctuation value of the usable duration of any adjacent period in multiple consecutive periods that is, for example, the usable duration of the disk in the first prediction period (such as the kth period) and the disk in the first prediction period.
  • the ratio of the absolute value of the difference between the usable duration of the second prediction period (such as the k+1th period) adjacent to the prediction period to the usable duration of the disk in the first prediction period is used as the usable duration of the disk in the first prediction period.
  • the usage fluctuation value of the usage duration and the usage duration of the disk in the second prediction period is used as the usable duration of the disk in the first prediction period.
  • the second smoothing coefficient corresponding to the usage fluctuation value range based on the preset step size, and based on the adjusted at least one usage
  • the first smoothing coefficient corresponding to the volume fluctuation value interval and the first smoothing coefficient corresponding to the unadjusted other usage fluctuation value intervals return to the execution to determine the usable time of the disk in multiple consecutive periods until any adjacent in the multiple consecutive periods
  • the duration fluctuation values of the usable duration of the time period are all less than or equal to the second set threshold, so that the determined second smoothing coefficient corresponding to each usage fluctuation value range can better meet the actual needs of each usage fluctuation value range.
  • the first adjustment method is: by adjusting the first smoothing coefficient based on the preset step size
  • the first smoothing coefficient corresponding to any usage fluctuation value interval for example, only the first smoothing coefficient corresponding to the first usage fluctuation value interval is adjusted, and the first smoothing coefficient corresponding to other usage fluctuation value intervals is not adjusted.
  • the first smoothing coefficients corresponding to the first usage fluctuation value intervals and the first smoothing coefficients corresponding to other usage fluctuation value intervals in each usage fluctuation value interval except the first usage fluctuation value interval return to execute the determination A flow of how long a disk has been available for multiple consecutive periods of time.
  • the first smoothing coefficient corresponding to the first usage fluctuation range can make multiple If the duration fluctuation value of the usable duration of any adjacent period in the continuous period is less than or equal to the second set threshold value, then the first smoothing coefficient corresponding to the adjusted first usage fluctuation value range and each usage The first smoothing coefficients corresponding to other usage fluctuation value intervals in the amount fluctuation value interval except the first usage fluctuation value interval are used as the final smoothing coefficients corresponding to each usage fluctuation value interval.
  • first smoothing coefficient corresponding to the first usage fluctuation range is adjusted to be equal to or slightly greater than the preset step size, there is no adjusted first smoothing coefficient corresponding to the first usage fluctuation range and each The first smoothing coefficient corresponding to other usage fluctuation value intervals in the usage fluctuation value interval except the first usage fluctuation value interval can make more
  • the duration fluctuation value of the usable duration of any adjacent period in a continuous period is less than or equal to the second set threshold, then it is necessary to start adjusting the first smoothing coefficient corresponding to the second usage fluctuation value interval, and at the same time adjust the first
  • the first smoothing coefficient corresponding to a usage fluctuation range is restored to the initially set value before adjustment, and the first smoothing coefficients corresponding to other usage fluctuation ranges are not adjusted.
  • the first smoothing coefficient corresponding to the volume fluctuation value interval can make the duration fluctuation value of the usable duration of any adjacent period in the multiple continuous periods less than or equal to 2.
  • the threshold In this way, in the adjustment process according to the above-mentioned adjustment process, if there is a first smoothing coefficient corresponding to an adjusted usage fluctuation value interval and a first smoothing coefficient corresponding to other usage fluctuation value intervals that have not been adjusted can be If the above-mentioned condition of stopping execution to determine the usable duration of the disk in multiple consecutive periods is satisfied, the first smoothing coefficient corresponding to the adjusted first usage fluctuation value interval and other usage fluctuations without adjustment can be used. The first smoothing coefficient corresponding to the value interval is used as the final smoothing coefficient corresponding to each usage fluctuation value interval.
  • the condition of the duration of use can be adjusted according to the second adjustment method.
  • the second adjustment method is: based on the preset step size, adjust the first smoothing coefficient corresponding to at least two usage fluctuation value intervals in each usage fluctuation value interval, for example, adjust the first smoothing coefficient corresponding to the first usage fluctuation value interval coefficient and adjust the first smoothing coefficient corresponding to the second usage fluctuation value interval, that is, subtract a preset step from the first smoothing coefficient corresponding to the first usage fluctuation value interval until it is equal to or Slightly larger than the preset step size, so that multiple values can be obtained, the first smoothing coefficient corresponding to the first usage fluctuation value range and the multiple values as the multiple smoothing coefficient values corresponding to the first usage fluctuation value range , such as the first smoothing coefficient ⁇ 1 corresponding to the first usage fluctuation value interval minus the preset step size l, that is, ⁇ 1 -l to get a value q, and subtracting the preset step size l from this value q to get a value h, and then use this value h to subtract the preset step size l to
  • the first smoothing coefficient ⁇ 1 and multiple values such as q, h, and d corresponding to the first usage fluctuation value interval are used as the values of the multiple smoothing coefficients corresponding to the first usage fluctuation value interval.
  • the first smoothing coefficient corresponding to the second usage fluctuation value interval is subtracted by a preset step size each time until the reduction is equal to or slightly greater than the first smoothing coefficient corresponding to the first usage fluctuation value interval, so that A plurality of values are obtained, the first smoothing coefficient corresponding to the second usage fluctuation value interval and the plurality of values are taken as the values of the plurality of smoothing coefficients corresponding to the second usage fluctuation value interval, for example, the second usage fluctuation value interval corresponds to
  • the first smoothing coefficient ⁇ 2 minus the preset step size l that is, ⁇ 2 -l to get a value f, subtract the preset step size l from this value f, to get a value w, and then subtract the preset step size from this value w
  • a smoothing coefficient select any smoothing coefficient value from a plurality of smoothing coefficient values corresponding to the first usage fluctuation value interval
  • the first smoothing coefficient corresponding to the adjusted second usage fluctuation value interval from the second Choose any smoothing coefficient value from the multiple smoothing coefficient values corresponding to the usage fluctuation value interval
  • Multiple smoothing coefficient values corresponding to the fluctuation value interval are combined in pairs (that is, any smoothing coefficient value corresponding to the first usage fluctuation value interval and any smoothing coefficient value corresponding to the second usage fluctuation value interval are used as one Combination), so that multiple combination values can be obtained, and then fluctuate according to any combination value and other usage fluctuation value intervals in each
  • the duration fluctuation value of the usable duration of any adjacent period in multiple consecutive periods can be less than or equal to the second set threshold, then the combined value and the first usage fluctuation value interval can be divided
  • the first smoothing coefficients corresponding to other usage fluctuation value intervals other than the second usage fluctuation value interval are used as the final smoothing coefficients corresponding to each usage fluctuation value interval.
  • the above-mentioned combination method can also use a plurality of smoothing coefficient values corresponding to each of the three usage fluctuation value intervals to perform a three-three combination (that is, one smoothing coefficient value corresponding to each of the three usage fluctuation value intervals as a combination), and then perform processing according to the above-mentioned processing method, which will not be repeated here.
  • adjustments can be made in accordance with the first adjustment method first. If after adjusting the first smoothing coefficient corresponding to each range of usage fluctuation value, there is no first smoothing coefficient corresponding to the adjusted usage fluctuation value range.
  • the smoothing coefficient can be adjusted according to the second adjustment method if the above-mentioned condition of stopping execution to determine the usable duration of the disk in multiple consecutive periods is satisfied. Alternatively, adjustments may be made in accordance with the second adjustment method. If there is no combined value that can satisfy the above-mentioned conditions for stopping execution to determine the usable duration of the disk in multiple consecutive periods, then adjustments may be made in accordance with the first adjustment method.
  • the preset step size can be set according to the application scenario or the experience of those skilled in the art, and can also be dynamically adjusted during the process of calculating the smoothing coefficient, which is not limited in the embodiment of the present invention. Then calculate ⁇ based on the disk usage data of the historical disk.
  • the duration fluctuation threshold (that is, the second set threshold) of the usable duration of the disk in any adjacent period of multiple consecutive periods can be set according to the application scenario or the experience of those skilled in the art, and can also be dynamically set according to actual changes.
  • set the allowable change rate of the usable time length (ie, the long-term fluctuation threshold) C 20% for the Tth time period and the T+1th time period. Then, based on the disk usage of the disk in multiple historical periods, the process of determining the usable time of the disk in a certain period (such as the T-th period) is performed.
  • the smoothing coefficients corresponding to different usage fluctuation value intervals can be used for the corresponding smoothing process, so as to determine the usable time of the disk in the T period.
  • it can be determined The usable duration of the disk in a plurality of consecutive time periods after the T-th time period (for example, the T+1-th time period, the T+2-th time period, the T+3-th time period, etc.).
  • the usable duration of the disk in multiple consecutive periods (such as the T-th period, the T+1-th period, the T+2-th period, the T+3-th period, etc.) and the fluctuation of the duration of any adjacent period
  • Table 1 The values can be as shown in Table 1.
  • a smoothing coefficient corresponding to a usage fluctuation value interval and then update Table 1 according to the re-determined usable duration of the disk in multiple consecutive periods, until each duration fluctuation value in Table 1 is less than or equal to the allowable usable duration change rate
  • the format of the smoothing coefficient record may be as shown in Table 2.
  • Usage fluctuation range smoothing factor less than 30% 0.05 Greater than or equal to 30% and less than 60% 0.1 Greater than or equal to 60% 0.2
  • each usage fluctuation value interval and its corresponding second smoothing coefficient can be re-determined according to the actual application scenario, that is to say, each usage fluctuation value interval and its corresponding second smoothing coefficient can be changed dynamically of.
  • Step 204 if the difference between the remaining disk capacity of the disk to be predicted before the prediction time period and the predicted disk usage in the prediction time period is greater than a first set threshold, then use the i+1th time period as the prediction time period, Return to execute to obtain the disk usage of the disk to be predicted in each period of the sliding window before the prediction period, until the jth return execution, the remaining disk capacity of the disk to be predicted before the prediction period is the same as that of the prediction period The difference of the predicted disk usage is less than or equal to the first set threshold, so that it is determined that the usable duration of the i-1th time period of the disk to be predicted is j time periods.
  • the i+1th period is used as the prediction period, and execution returns Obtain the disk usage of the disk to be predicted in each period of the sliding window before the prediction period, which is equivalent to obtaining the disk usage of the disk to be predicted in each period of the sliding window before the i+1th period, so that it can be looped
  • the first set threshold may be set according to the application scenario or the experience of those skilled in the art, for example, the first set threshold is set to 0), which is not limited in this embodiment of the present invention.
  • the following describes the determination of the usable time of the disk to be predicted in the period i-1 through an exemplary processing flow, namely:
  • the above-mentioned embodiment shows that since the prior art solution uses methods of removing a minimum value, removing a maximum value, and removing a maximum value and a minimum value at the same time to determine the usable time of the disk, this solution is aimed at disk usage within 30 days It is more accurate to determine the usable time of the disk when there are one or two data noise points in the volume, but it is impossible to accurately determine the usable time of the disk if there are three or more data noise points in the disk usage within 30 days duration.
  • the technical solution in the present invention constructs a disk usage prediction function according to the disk usage of the disk to be predicted in each period of the sliding window before the prediction period (i.e., the i-th period), and uses the disk usage prediction function , to determine the predicted disk usage of the disk to be predicted during the prediction period.
  • the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is greater than the first set threshold, use the i+1th period as the prediction period, and return to the execution to obtain the disk to be predicted in the forecast
  • the disk usage of each period in the sliding window before the time period is equivalent to executing the acquisition of the disk usage of the disk to be predicted in the sliding window of each period before the i+1th time period, so that the disk usage of the disk to be predicted can be determined cyclically.
  • the predicted disk usage of each prediction period so that after returning to execute j times, the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is less than or equal to the first set threshold, Then, the loop can be exited, so that the usable duration of the disk to be predicted at the i-1th time period can be dynamically and accurately calculated as j time periods.
  • the program can make the disk usage prediction function constructed in each cycle more realistic, and more in line with the disk usage of each time period in this sliding window The real situation reflected, so that the usable time of the disk to be predicted at the i-1th time period can be determined more truly and accurately, and the accuracy of determining the usable time of the disk can be effectively improved.
  • the program since the program determines the predicted disk usage of the disk to be predicted in each prediction period through cyclic execution, it can automatically calculate the usable time of the disk to be predicted in the period i-1, so that excessive manual intervention can be avoided , and help to reduce the time and manpower spent on manually determining the usable time of the disk to be predicted, so as to improve the calculation efficiency of the usable time of the disk to be predicted in the i-1th time period.
  • FIG. 3 exemplarily shows a device for determining the usable time of a disk provided by an embodiment of the present invention, and the device can execute the flow of the method for determining the usable time of a disk.
  • the device includes:
  • the obtaining unit 301 is configured to obtain the disk usage of the disk to be predicted in each period of the sliding window before the prediction period; the prediction period is the i-th period;
  • the processing unit 302 is configured to construct a disk usage prediction function according to the disk usage in each time period within the sliding window; and determine the predicted disk of the disk to be predicted in the prediction period through the disk usage prediction function usage; if the difference between the remaining disk capacity of the disk to be predicted before the prediction period and the predicted disk usage of the prediction period is greater than the first set threshold, the i+1th period is used as the prediction period, Return to execute to obtain the disk usage of the disk to be predicted in each period of the sliding window before the prediction period, until the jth return execution, the remaining disk capacity of the disk to be predicted before the prediction period is the same as that of the prediction period The difference of the predicted disk usage is less than or equal to the first set threshold, so that it is determined that the usable duration of the i-1th time period of the disk to be predicted is j time periods.
  • processing unit 302 is specifically configured to:
  • n function parameter groups corresponding to the prediction function construction mode; based on the n function parameter groups, sequentially execute the prediction function construction mode, and determine the n function parameter groups respective corresponding loss function values, and comparing the loss function values corresponding to each of the n function parameter groups, determining the minimum loss function value, and determining the function parameter group corresponding to the minimum loss function value;
  • the disk usage prediction function is constructed.
  • processing unit 302 is specifically configured to:
  • the loss function value corresponding to the function parameter group is determined through the predicted disk usage corresponding to each time period and the real disk usage corresponding to each time period.
  • processing unit 302 is specifically configured to:
  • processing unit 302 is specifically configured to:
  • the second disk usage is determined.
  • processing unit 302 is specifically configured to:
  • the smoothing coefficient record is determined by:
  • the corresponding first smoothing coefficient and the unadjusted first smoothing coefficient corresponding to other usage fluctuation value intervals return to determine the usable duration of the disk in multiple consecutive periods until any adjacent
  • the duration fluctuation values of the usable duration of the period are all less than or equal to the second set threshold, so as to determine the second smoothing coefficients corresponding to the intervals of the usage fluctuation values;
  • the second smoothing coefficients corresponding to the intervals of the fluctuation value of the usage amount are stored in the smoothing coefficient records.
  • processing unit 302 is specifically configured to:
  • the embodiment of the present invention also provides a computing device, as shown in FIG. 4 , including at least one processor 401 and a memory 402 connected to the at least one processor.
  • the specific connection medium between the processor 401 and the memory 402, the connection between the processor 401 and the memory 402 in FIG. 4 is taken as an example.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the memory 402 stores instructions that can be executed by at least one processor 401.
  • At least one processor 401 can execute the instructions included in the aforementioned method for determining the usable time of a disk by executing the instructions stored in the memory 402. step.
  • the processor 401 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, by running or executing instructions stored in the memory 402 and calling data stored in the memory 402, thereby realizing data deal with.
  • the processor 401 may include one or more processing units, and the processor 401 may integrate an application processor and a modem processor.
  • the call processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 401 .
  • the processor 401 and the memory 402 can be implemented on the same chip, and in some embodiments, they can also be implemented on independent chips.
  • the processor 401 can be a general processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present invention.
  • a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the method for determining the usable time of a disk can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the memory 402 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules.
  • Memory 402 can include at least one type of storage medium, for example, can include flash memory, hard disk, multimedia card, card memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CD, etc.
  • Memory 402 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory 402 in the embodiment of the present invention may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
  • an embodiment of the present invention also provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program is run on the computing device, the computing device Execute the steps of the above-mentioned method for determining the usable time of a disk.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Abstract

本发明实施例提供了一种确定磁盘可使用时长的方法及装置,该方法包括根据待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数,通过磁盘使用量预测函数,确定待预测磁盘在预测时段的预测磁盘使用量,若待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值大于第一设定阈值,则返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,差值小于等于第一设定阈值,则可确定出待预测磁盘在第i-1时段的可使用时长为j个时段。如此,该方案通过不断循环地重新构造磁盘使用量预测函数,可以有效地提高确定磁盘的可使用时长的准确率。

Description

一种确定磁盘可使用时长的方法及装置
相关申请的交叉引用
本申请要求在2021年09月02日提交中国专利局、申请号为202111027635.7、申请名称为“一种确定磁盘可使用时长的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及金融科技(Fintech)领域,尤其涉及一种确定磁盘可使用时长的方法及装置。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技转变,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。随着信息化的快速发展,各式各样的数据也逐渐增多,为了更好地存储这些数据,通常会选择磁盘作为存储介质来存储这些数据。基于此,为了实现对磁盘容量的充分利用,以便数据能够及时地存储至磁盘中,通常会对磁盘的可使用时长进行预测。那么,如何有效地预测磁盘的可使用时长成为急需解决的问题。
现阶段,通常采用30天作为时间窗口,通过统计30天内的磁盘使用量计算出日均使用增量,并基于当前磁盘剩余容量以及日均使用增量来预测磁盘的可使用时长。但是,30天内的磁盘使用量如果出现一个或两个数据突增或突减的情况,则相比平日数据量就会显得突兀,该一个或两个数据就是数据噪点,就会导致所预测出的磁盘的可使用时长不准确。为了解决这个技术问题,在一种现有的解决方案中,针对30天内的磁盘使用量,通过采用去掉一个最小值、去掉一个最大值以及同时去掉一个最大值和最小值的方式来预测磁盘的可使用时长。即,将30天内的磁盘使用量分别去掉一个最小值、去掉一个最大值以及同时去掉一个最大值和最小值后,计算出三个日均使用增量。再基于当前磁盘剩余容量以及三个日均使用增量来预测磁盘的可使用时长。然而,这种处理方式对于30天内的磁盘使用量若出现一个或两个数据噪点时所预测出的磁盘的可使用时长较为准确,但是对于30天内的磁盘使用量若出现多个数据噪点(比如出现三个或三个以上数据噪点)时所预测出的磁盘的可使用时长依然是不准确的,而且这种处理方式在预测过程中还需要依赖人工过多的介入,会导致磁盘的可使用时长的预测效率较低。
综上,目前亟需一种确定磁盘可使用时长的方法,用以有效地提高确定磁盘的可使用时长的准确率。
发明内容
本发明实施例提供了一种确定磁盘可使用时长的方法及装置,用以有效地提高确定磁盘的可使用时长的准确率。
第一方面,本发明实施例提供了一种确定磁盘可使用时长的方法,包括:
获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量;所述预测时段为第i时段;
根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数;
通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量;
若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
上述技术方案中,通过根据待预测磁盘在预测时段(即第i时段)之前的滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数,并通过磁盘使用量预测函数,确定待预测磁盘在预测时段的预测磁盘使用量。在待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值大于第一设定阈值时,将第i+1时段作为预测时段,并返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,此时相当于执行获取待预测磁盘在第i+1时段之前的滑动窗口内各时段的磁盘使用量,以此可循环确定出待预测磁盘在每个预测时段的预测磁盘使用量,从而直至在返回执行第j次后,待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值小于等于第一设定阈值,那么就可以退出循环,以此即可动态准确地计算出待预测磁盘在第i-1时段的可使用时长为j个时段。如此,该方案通过不断循环地重新构造磁盘使用量预测函数,也就可以使得每一轮循环构造出的磁盘使用量预测函数更加贴合实际,更加符合这一滑动窗口内各时段的磁盘使用量所反映的真实情况,从而可以更真实准确地确定出待预测磁盘在第i-1时段的可使用时长,进而可以有效地提高确定磁盘的可使用时长的准确率。同时,由于该方案通过循环执行确定待预测磁盘在每个预测时段的预测磁盘使用量,即可自动计算出待预测磁盘在第i-1时段的可使用时长,因此可以避免人工过多的介入,并有助于减少依靠人工确定待预测磁盘的可使用时长所耗费的时间和人力,从而可以提高待预测磁盘在第i-1时段的可使用时长的计算效率。
可选地,所述根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数,包括:
通过最小二乘法对所述滑动窗口内各时段的磁盘使用量进行m种预测函数构造模式;
针对每种预测函数构造模式,生成所述预测函数构造模式对应的n个函数参数组;基于所述n个函数参数组,依次执行所述预测函数构造模式,确定出所述n个函数参数组各自对应的损失函数值,并将所述n个函数参数组各自对应的损失函数值进行比对,确定出最小损失函数值,并确定出所述最小损失函数值对应的函数参数组;
将所述m种预测函数构造模式各自对应的最小损失函数值进行比对,确定出最小的最小损失函数值,并将所述最小的最小损失函数值对应的函数参数组作为目标函数参数组,从而构造出所述磁盘使用量预测函数。
上述技术方案中,针对m种预测函数构造模式中每种预测函数构造模式,生成该预测 函数构造模式对应的n个函数参数组,并通过该n个函数参数组依次循环执行该预测函数构造模式,可以确定出该n个函数参数组各自对应的损失函数值,并将该预测函数构造模式下n个函数参数组各自对应的损失函数值进行对比,即可及时地确定出最小损失函数值以及该最小损失函数值对应的函数参数组。如此,在确定出m种预测函数构造模式各自对应的最小损失函数值以及最小损失函数值对应的函数参数组后,可以通过对m种预测函数构造模式各自对应的最小损失函数值进行对比,即可准确地确定出最小的最小损失函数值,并可以确定出该最小的最小损失函数值对应的函数参数组,将该最小的最小损失函数值对应的函数参数组作为目标函数参数组。然后,根据该目标函数参数组即可准确地构造出磁盘使用量预测函数。如此,该方案通过设置m种预测函数构造模式,可以使得构造出的磁盘使用量预测函数更加贴合实际、更真实准确,从而能够更准确地确定出磁盘在某一预测时段的预测磁盘使用量。
可选地,所述基于所述n个函数参数组,依次执行所述预测函数构造模式,确定出所述n个函数参数组各自对应的损失函数值,包括:
针对所述n个函数参数组中每个函数参数组,构造所述函数参数组对应的函数;
将所述滑动窗口内各时段输入到所述函数参数组对应的函数,确定出所述各时段对应的预测磁盘使用量;
通过所述各时段对应的预测磁盘使用量与所述各时段对应的真实磁盘使用量,确定出所述函数参数组对应的损失函数值。
上述技术方案中,在某一预测函数构造模式下,通过针对该预测函数构造模式下随机生成的n个函数参数组中每个函数参数组,可以构造出该函数参数组对应的函数,如此,将滑动窗口内各时段输入到该函数即可自动计算出各时段对应的预测磁盘使用量,将每个时段对应的预测磁盘使用量与该时段对应的真实磁盘使用量进行差值运算,并将所计算出的各时段对应的差值进行累加,或者可以将计算出的各时段对应的差值平方进行累加,即可准确地计算出该函数参数组对应的损失函数值。
可选地,所述通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量,包括:
通过所述磁盘使用量预测函数,确定出所述待预测磁盘在所述第i时段的第一磁盘使用量;
对所述第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,并将所述第二磁盘使用量确定为所述待预测磁盘在所述预测时段的预测磁盘使用量。
上述技术方案中,由于在磁盘的使用过程中,经常会有磁盘使用量突增或突减的情况,因此为了能够更准确地确定出磁盘在某一时段的可使用时长,该方案通过对磁盘使用量预测函数所确定出的磁盘在某一预测时段的第一磁盘使用量进行平滑处理,可以使得第二磁盘使用量与该磁盘在平常时段的正常磁盘使用量相比不会相差太大,以此确保该第二磁盘使用量能够位于该磁盘的正常磁盘使用量范围内,如此可消除滑动窗口内所出现的某一时段或某几个时段的磁盘使用量突增或突减等异常因素对所确定出的磁盘在某一时段的可使用时长的准确性的影响,也即是消除滑动窗口内所出现的一个或多个数据突增或突减对确定磁盘在某一时段的可使用时长带来的干扰,从而可以有助于后续能够更真实准确地确定出磁盘在某一时段的可使用时长,并可以确保磁盘在多个连续时段的可使用时长的连续性,如此也可避免磁盘在多个连续时段的可使用时长出现不连续的问题(即磁盘在多个时 段的可使用时长呈现出突高突低的问题)。此外,现有技术方案中在30天内的磁盘使用量若出现多个数据噪点,则按照现有技术方案的做法还是会存在一个或几个数据噪点是无法去除的(因为现有技术方案是去掉一个最小值、去掉一个最大值以及同时去掉一个最大值和最小值),或者在30天内的磁盘使用量若出现一个或两个数据噪点,该一个或两个数据噪点被去除了,如此磁盘在多个连续时段的可使用时长就会出现突高突低的情况,所呈现出的磁盘在多个连续时段的可使用时长就是不连续的,但是本发明中的技术方案通过对磁盘使用量预测函数所确定出的磁盘在某一预测时段的磁盘使用量进行平滑处理,即可解决现有技术中存在磁盘在多个连续时段的可使用时长不连续的问题。
可选地,所述对所述第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,包括:
通过所述磁盘使用量预测函数,确定出所述待预测磁盘在第i-1时段的第三磁盘使用量;
确定所述第一磁盘使用量与所述第三磁盘使用量的使用量波动值;
确定所述使用量波动值在平滑系数记录中所对应的使用量波动值区间,并确定出所述使用量波动值区间对应的平滑系数;
基于所述平滑系数以及所述第一磁盘使用量,确定出所述第二磁盘使用量。
上述技术方案中,由于在磁盘的使用过程中,经常会有磁盘使用量突增或突减的情况,倘若使用一个固定的平滑系数来针对不同使用量波动值对应的第一磁盘使用量进行平滑处理,则就不能更真实地反映磁盘在某一预测时段的预测磁盘使用量。而且,一个固定的平滑系数无法满足不同使用量波动值的实际需求,灵活性较差,从而也会影响第二磁盘使用量的准确性、真实性。基于此,本发明中的技术方案通过针对不同的使用量波动值区间配置不同的平滑系数,如此就可以使得每个使用量波动值区间内所包含的各使用量波动值都能按照该使用量波动值区间对应的平滑系数针对各使用量波动值对应的第一磁盘使用量进行平滑处理,从而使得每个使用量波动值区间对应的平滑系数能够更加符合该使用量波动值区间的实际需求,灵活性也更高。此外,在确定磁盘在某一时段的可使用时长的过程中若涉及到不同的使用量波动值区间,也能够相应地进行自适应调整平滑系数,如此就可以使得磁盘在该时段的可使用时长变化能够更平滑。
可选地,通过下述方式确定所述平滑系数记录:
设置所述平滑系数记录中各使用量波动值区间对应的第一平滑系数以及用于调整所述第一平滑系数的预设步长;
基于磁盘在多个历史时段的磁盘使用量以及至少一个使用量波动值区间对应的第一平滑系数,确定所述磁盘在多个连续时段的可使用时长;
确定所述多个连续时段中任意相邻时段的可使用时长的时长波动值;
若存在至少一个时长波动值大于第二设定阈值,则基于所述预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长,直至所述多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于所述第二设定阈值,从而确定出所述各使用量波动值区间对应的第二平滑系数;
将所述各使用量波动值区间对应的第二平滑系数存储至所述平滑系数记录中。
上述技术方案中,由于现有平滑方法是基于应用场景或根据经验人为设置一个固定的 平滑系数,主观性较强,灵活性较差,无法真实准确地反映磁盘在某一预测时段的预测磁盘使用量,因此,本发明中的技术方案通过为各使用量波动值区间设置各自对应的第一平滑系数以及设置用于调整第一平滑系数的预设步长。再根据多个历史时段的磁盘使用量循环执行确定磁盘在多个连续时段的可使用时长的流程,以此来不断更新各使用量波动值区间对应的第一平滑系数,直至多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,如此所确定出的各使用量波动值区间对应的第二平滑系数能够更加符合各使用量波动值区间的实际需求,同时也可以使得在确定磁盘在某一时段的可使用时长的过程中若涉及到不同的使用量波动值区间,能够按照不同的使用量波动值区间对应的平滑系数用于各自对应的平滑处理,如此灵活性也更高,并可以便于各平滑系数能够自适应调整各自对应的第一磁盘使用量,从而可以使得磁盘在某一时段的可使用时长变化能够更平滑。
可选地,所述基于所述预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长,包括:
基于所述预设步长,调整所述各使用量波动值区间中任一使用量波动值区间对应的第一平滑系数,并基于所述调整后的任一使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长;或者,
基于所述预设步长,调整所述各使用量波动值区间中至少两个使用量波动值区间对应的第一平滑系数,并基于所述调整后的至少两个使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述至少两个使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长。
上述技术方案中,通过基于预设步长,调整任一使用量波动值区间对应的第一平滑系数,比如调整第一使用量波动值区间对应的第一平滑系数,如果基于调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,能够使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,则可以将调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数作为最终的各使用量波动值区间对应的平滑系数。或者,也可以基于预设步长,调整各使用量波动值区间中至少两个使用量波动值区间对应的第一平滑系数,比如调整第一使用量波动值区间对应的第一平滑系数以及调整第二使用量波动值区间对应的第一平滑系数,如果基于调整后的第一使用量波动值区间对应的第一平滑系数、调整后的第二使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间和该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,能够使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,则可以将调整后的第一使用量波动值区间对应的第一平滑系数、调整后的第二使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间和该第二使用量波动值区间之 外的其它使用量波动值区间对应的第一平滑系数作为最终的各使用量波动值区间对应的平滑系数。
第二方面,本发明实施例还提供了一种确定磁盘可使用时长的装置,包括:
获取单元,用于获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量;所述预测时段为第i时段;
处理单元,用于根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数;通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量;若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
第三方面,本发明实施例提供一种计算设备,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述第一方面任意所述的确定磁盘可使用时长的方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述第一方面任意所述的确定磁盘可使用时长的方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种可能的系统架构的示意图;
图2为本发明实施例提供的一种确定磁盘可使用时长的方法的流程示意图;
图3为本发明实施例提供的一种确定磁盘可使用时长的装置的结构示意图;
图4为本发明实施例提供的一种计算设备的结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
为了便于理解本发明实施例,首先以图1中示出的一种可能的系统架构为例说明适用于本发明实施例的确定磁盘可使用时长的系统架构。如图1所示,该确定磁盘可使用时长的系统架构可以包括监控设备100、数据处理系统200以及至少一个计算机设备(比如计算机设备301、计算机设备302和计算机设备303等)。其中,监控设备100与数据处理系 统200可以通过有线方式进行通信连接,或者可以通过无线方式进行通信连接;监控设备100与每个计算机设备之间可以通过有线方式进行通信连接,或者可以通过无线方式进行通信连接。
其中,监控设备100用于实时监控每个计算机设备的磁盘,并实时获取每个计算机设备的磁盘的磁盘使用量情况。然后,监控设备100可以定期(比如可以每间隔5分钟、10分钟或30分钟等,可以每间隔1h、2h或5h等,也可以每天、每2天或每5天等)将每个计算机设备的磁盘的磁盘使用量数据发送给数据处理系统200,数据处理系统200针对接收到的各计算机设备的磁盘的磁盘使用量数据进行存储。或者,也可以根据数据处理系统200发送的数据获取请求进行发送,比如该数据获取请求用于指示获取某一个或某几个计算机设备的磁盘的磁盘使用量数据,则监控设备100将某一个或某几个计算机设备的磁盘的磁盘使用量数据发送给数据处理系统200。当然,数据处理系统200也可以指定获取某一个或某几个计算机设备的磁盘在某一时段或多个时段内(比如5天、10天或20天等)的磁盘使用量数据。其中,该数据处理系统200可以是单个服务器(比如独立的物理服务器),也可以是多个物理服务器构成的服务器集群或者分布式系统。示例性地,某一用户通过客户端(该客户端可以是数据处理系统200提供的面向所属用户的网页客户端或移动端应用客户端等)向数据处理系统200发送查询某一个或某几个计算机设备的磁盘在某一时段的可使用时长的请求,比如查询计算机设备301的磁盘在某一时段的可使用时长。则数据处理系统200在接收到该用户发送的查询请求后,可以通过本地存储的各计算机设备的磁盘的磁盘使用量数据中获取计算机设备301的磁盘在预测时段(比如位于当前时段之后的最近时段作为预测时段,比如当前时段为xxxx年xx月1日,则预测时段可以为xxxx年xx月2日)之前的滑动窗口内各时段的磁盘使用量数据,并针对计算机设备301的磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量数据进行处理来确定计算机设备301的磁盘在某一时段的可使用时长,然后将计算机设备301的磁盘在某一时段的可使用时长发送给用户所在客户端进行展示。或者,也可以在接收到该用户发送的查询请求后,向监控设备100发送获取计算机设备301的磁盘在预测时段之前的滑动窗口内(比如5天、10天、20天或30天等)各时段的磁盘使用量数据的请求,监控设备100在接收到数据获取请求后,则会将计算机设备301的磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量数据发送给数据处理系统200,数据处理系统200在接收到计算机设备301的磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量数据后,针对计算机设备301的磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量数据进行处理来确定计算机设备301的磁盘在某一时段的可使用时长,然后将计算机设备301的磁盘在某一时段的可使用时长发送给用户所在客户端进行展示。
需要说明的是,上述图1所示的系统架构仅是一种示例,本发明实施例对此不做限定。
基于上述描述,图2示例性的示出了本发明实施例提供的一种确定磁盘可使用时长的方法的流程,该流程可以由确定磁盘可使用时长的装置执行。
如图2所示,该流程具体包括:
步骤201,获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量。
本发明实施例中,针对计算机设备(比如笔记本电脑或台式电脑等)中的磁盘,磁盘作为计算机设备的主要存储介质之一,用于存储计算机设备的大量数据,并可以防止数据丢失,因此及时准确地确定出磁盘的使用量情况以及磁盘的可使用时长对于计算机设备的 系统运维工作起到重要作用,所以通常会对计算机设备中磁盘在某一时段或几个时段的可使用时长进行预测,以便为后续合理使用磁盘提供参照支持。其中,需要说明的是,某一时段的磁盘使用量是指磁盘在该时段的被占用量和被清理量之和所得到的数量值,也即是说,某一时段的磁盘使用量可以为正值,即磁盘在某一时段的被占用量大于磁盘在该时段的被清理量,也可以为负值,即磁盘在某一时段的被占用量小于磁盘在该时段的被清理量,比如假设磁盘在某一时段的被占用量为2G,在该时段的被清理量为0,则磁盘在该时段的磁盘使用量为2G,或者在该时段的被清理量为-0.5G,则磁盘在该时段的磁盘使用量为1.5G。或者,假设磁盘在某一时段的被占用量为2G,在该时段的被清理量为-3G,则磁盘在该时段的磁盘使用量为-1G,或者在该时段的被清理量为-4G,则磁盘在该时段的磁盘使用量为-2G。
其中,磁盘在某一时段或多个时段的磁盘使用量数据可以通过监控组件或监控设备来获取,比如可以在计算机设备设置一个监控组件来监控该计算机设备中的磁盘,并可以实时获取该磁盘的磁盘使用量数据,或者也可以在计算设备外部单独设置一个监控设备,该监控设备可以用于监控多个计算机设备的磁盘,并可以实时获取该多个计算机设备的磁盘的磁盘使用量数据。然后,监控组件可以将实时获取的某一计算机设备中的磁盘的磁盘使用量数据上报至数据处理系统,或者,监控设备可以将实时获取的多个计算机设备的磁盘的磁盘使用量数据上报至数据处理系统,数据处理系统在获取到一个或多个计算机设备的磁盘的磁盘使用量数据后,可以自动针对一个或多个计算机设备的磁盘在预测时段(比如第i时段)之前的滑动窗口内(比如5天、10天、20天或30天等)各时段的磁盘使用量数据进行处理,以此确定该一个或多个计算机设备的磁盘在第i-1时段的可使用时长,或者也可以基于某一用户通过客户端发送的查询某一个或多个计算机设备的磁盘的可使用时长的请求来确定该一个或多个计算机设备的磁盘在第i-1时段的可使用时长。其中,确定磁盘可使用时长的装置可以设置在该数据处理系统中,作为该数据处理系统的一个功能部件,确定磁盘可使用时长的方法可以由确定磁盘可使用时长的装置执行,或者可以由设置于确定磁盘可使用时长的装置内的芯片或集成电路执行。
示例性地,数据处理系统可以自动针对某一计算机设备(比如计算机设备A)的磁盘在某一时段(比如当前时段,例如当前时段为xxxx年xx月5日)的可使用时长进行计算,此时就可以本地的存储设备中获取该计算机设备A在xxxx年xx月6日(即预测时段)之前的滑动窗口内(比如30天)各时段的磁盘使用量数据,比如在xxxx年xx月6日之前的30天内每天的磁盘使用量分别为Y=[y 1,y 2,…,y 30],或者,数据处理系统可以向监控设备发送获取计算机设备A在xxxx年xx月6日之前的滑动窗口内各时段的磁盘使用量数据的请求,该请求用于从监控设备获取计算机设备A在xxxx年xx月6日之前的滑动窗口内各时段的磁盘使用量数据。或者,数据处理系统基于某一用户通过客户端发送的针对某一计算机设备(比如计算机设备A)的磁盘在某一时段(比如当前时段,例如当前时段为xxxx年xx月5日)的可使用时长的查询请求来计算该计算机设备的磁盘在某一时段的可使用时长,此时数据处理系统可以从本地的存储设备中获取该计算机设备A在xxxx年xx月6日(即预测时段)之前的滑动窗口内(比如30天)各时段的磁盘使用量数据,或者,数据处理系统可以向监控设备发送获取计算机设备A在xxxx年xx月6日之前的滑动窗口内各时段的磁盘使用量数据的请求,该请求用于从监控设备获取计算机设备A在xxxx年xx月6日之前的滑动窗口内各时段的磁盘使用量数据。
步骤202,根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数。
本发明实施例中,通过根据待预测磁盘在滑动窗口内各时段的磁盘使用量,可以拟合出用于预测待预测磁盘在预测时段(比如第i时段)的磁盘使用量的磁盘使用量预测函数,例如,通过使用最小二乘法根据第i时段之前的30天内每天的磁盘使用量Y=[y 1,y 2,…,y 30]拟合出磁盘使用量预测函数,即F(x)=θ 01×x 12×x 2+…+θ n×x n,通过使得
Figure PCTCN2022100508-appb-000001
的值最小来求出最佳参数组θ=[θ 012,…,θ n]以及该最佳参数组θ对应的最佳参数n。具体地,首先通过最小二乘法对滑动窗口内各时段的磁盘使用量进行m种预测函数构造模式,并针对每种预测函数构造模式,生成预测函数构造模式对应的n个函数参数组。再基于该n个函数参数组中每个函数参数组,依次执行该n个函数参数组对应的预测函数构造模式,即可确定出该预测函数构造模式下该n个函数参数组各自对应的损失函数值,并将该n个函数参数组各自对应的损失函数值进行比对,即可确定出最小损失函数值,并确定出最小损失函数值对应的函数参数组。然后,将m种预测函数构造模式各自对应的最小损失函数值进行比对,可以确定出最小的最小损失函数值,并可以确定出该最小的最小损失函数值对应的函数参数组,将该最小的最小损失函数值对应的函数参数组作为目标函数参数组,以此即可准确地构造出磁盘使用量预测函数。其中,在确定n个函数参数组各自对应的损失函数值时,针对n个函数参数组中每个函数参数组,构造该函数参数组对应的函数,并将滑动窗口内各时段输入到该函数参数组对应的函数,确定出各时段对应的预测磁盘使用量。然后通过将每个时段对应的预测磁盘使用量与该时段对应的真实磁盘使用量进行差值运算,并将计算出的各时段对应的差值平方进行累加,即可准确地计算出该函数参数组对应的损失函数值。如此,该方案通过设置m种预测函数构造模式,可以使得构造出的磁盘使用量预测函数更加贴合实际、更真实准确,从而能够更准确地确定出磁盘在某一预测时段的预测磁盘使用量。
比如,设置8种预测函数构造模式,按照顺序依次执行每种预测函数构造模式的流程,例如依次执行第一种预测函数构造模式的流程、第二种预测函数构造模式的流程等,如此即可依次计算出每种预测函数构造模式对应的最佳参数组θ以及该最佳参数组θ对应的最佳参数n,同时也可以依次计算出每种预测函数构造模式对应的最小损失误差值,然后将8种预测函数构造模式各自对应的最小损失误差值进行比较,确定出最小的最小损失误差值,并可以确定出最小的最小损失误差值对应的最佳参数组θ以及该最佳参数组θ对应的最佳参数n,根据最小的最小损失误差值对应的最佳参数组θ以及该最佳参数组θ对应的最佳参数n即可拟合出磁盘使用量预测函数。
示例性地,本发明实施例可以通过使用python numpy库的poly1d函数构造出F(x)函数的框架,并使用scipy库中的leastsq最小二乘法函数来实现磁盘使用量预测函数的拟合。下面通过一个示例性的执行脚本对磁盘使用量预测函数的拟合进行描述,即:
Figure PCTCN2022100508-appb-000002
Figure PCTCN2022100508-appb-000003
基于此,在n=3时,通过上述执行脚本即可计算出n=3时的最小ERROR值以及对应的最佳拟合的参数组parameter[θ],并在n=4时、n=5时、n=6时、n=7时、n=8时以及n=9时依次通过上述执行脚本,即可计算出n=4时、n=5时、n=6时、n=7时、n=8时以及n=9时各自的最小ERROR值以及对应的最佳拟合的参数组parameter[θ]。然后,将n=3时、n=4时、n=5时、n=6时、n=7时、n=8时以及n=9时各自的最小ERROR值进行比较,即可得出最小的最小ERROR值,比如n=5时的最小ERROR值是最小的,同时也可以得到n=5时的最小ERROR值对应的最佳拟合的参数组parameter[θ]=[θ 012345]。然后,将该最佳拟合的参数组带回到F(x)函数中,即可拟合出最佳的磁盘使用量预测函数F(x)=θ 01×x 12×x 23×x 34×x 45×x 5
步骤203,通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量。
本发明实施例中,由于在磁盘的使用过程中,经常会有磁盘使用量突增或突减的情况,因此为了能够更准确地确定出磁盘在某一时段的可使用时长,该方案通过对磁盘使用量预测函数所确定出的磁盘在某一预测时段的第一磁盘使用量进行平滑处理,可以使得第二磁盘使用量与该磁盘在平常时段的正常磁盘使用量相比不会相差太大,以此确保该第二磁盘使用量能够位于该磁盘的正常磁盘使用量范围内,如此可消除滑动窗口内所出现的某一时段或某几个时段的磁盘使用量突增或突减等异常因素对所确定出的磁盘在某一时段的可使用时长的准确性的影响,从而可以有助于后续能够更真实准确地确定出磁盘在某一时段的可使用时长,并可以确保磁盘在多个连续时段的可使用时长的连续性。具体地,首先通过磁盘使用量预测函数,确定出待预测磁盘在第i时段的第一磁盘使用量,并通过对第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,然后将该第二磁盘使用量确定为待预测磁盘在预测时段的预测磁盘使用量。其中,可以通过下述方式计算出待预测磁盘在预测时段的预测磁盘使用量,即:
S(t)=α×F(t)-(1-α)×S(t-1)
其中,S(t)用于表示对待预测磁盘在第t时段(即预测时段)的第一磁盘使用量进行平滑处理后所得到的预测磁盘使用量;F(t)用于表示待预测磁盘在第t预测时段的第一磁盘使用量;S(t-1)用于表示对待预测磁盘在第t-1时段的第一磁盘使用量进行平滑处理后 所得到的预测磁盘使用量;α用于表示平滑待预测磁盘在第t时段的第一磁盘使用量所使用的平滑系数。
其中,在对第一磁盘使用量进行平滑处理时,首先通过磁盘使用量预测函数,确定出待预测磁盘在第i-1时段的第三磁盘使用量,并计算出第一磁盘使用量与第三磁盘使用量的使用量波动值,也即是使用第一磁盘使用量与第三磁盘使用量的差值绝对值与第三磁盘使用量的比值来作为第一磁盘使用量与第三磁盘使用量的使用量波动值。再从平滑系数记录中确定该使用量波动值所位于的使用量波动值区间,并将该使用量波动值区间对应的平滑系数作为平滑第一磁盘使用量所使用的平滑系数。然后,基于该平滑系数以及第一磁盘使用量,确定出第二磁盘使用量。
示例性地,假设针对磁盘在第31天的磁盘使用量进行预测,则通过磁盘使用量预测函数可以计算出第31天的第一磁盘使用量,并通过磁盘使用量预测函数可以计算出第30天的第三磁盘使用量。再计算出第31天的第一磁盘使用量与第30天的第三磁盘使用量的使用量波动值,并从平滑系数记录中确定该使用量波动值所位于的使用量波动值区间,同时确定出该使用量波动值区间对应的平滑系数α i。然后通过上述计算预测磁盘使用量的方式计算出磁盘在第31天的预测磁盘使用量S(31)=α i×F(31)-(1-α i)×S(30)。
此外,由于在磁盘的使用过程中,经常会有磁盘使用量突增或突减的情况,倘若使用一个固定的平滑系数来针对不同使用量波动值对应的第一磁盘使用量进行平滑处理,则就不能更真实地反映磁盘在某一预测时段的预测磁盘使用量。而且,一个固定的平滑系数无法满足不同使用量波动值的实际需求,灵活性较差,从而也会影响第二磁盘使用量的准确性、真实性。基于此,本发明中的技术方案通过针对不同的使用量波动值区间配置不同的平滑系数,如此就可以使得每个使用量波动值区间内所包含的各使用量波动值都能按照该使用量波动值区间对应的平滑系数针对各使用量波动值对应的第一磁盘使用量进行平滑处理。具体地,可以通过下述方式确定平滑系数记录:首先通过为各使用量波动值区间设置各自对应的第一平滑系数以及设置用于调整第一平滑系数的预设步长。再基于磁盘在多个历史时段的磁盘使用量(比如在磁盘的运行使用过程中所产生的某一30天内的磁盘使用量)以及至少一个使用量波动值区间(比如一个使用量波动值区间或任意两个使用量波动值区间或三个使用量波动值区间等)对应的第一平滑系数,确定磁盘在多个连续时段的可使用时长。然后,计算出多个连续时段中任意相邻时段的可使用时长的时长波动值,也即是,比如使用磁盘在第一预测时段(比如第k时段)的可使用时长以及磁盘在该第一预测时段所相邻的第二预测时段(比如第k+1时段)的可使用时长的差值绝对值与磁盘在第一预测时段的可使用时长的比值来作为磁盘在第一预测时段的可使用时长与磁盘在第二预测时段的可使用时长的使用量波动值。最后,如果确定存在一个或多个时长波动值大于第二设定阈值,则需要基于预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定磁盘在多个连续时段的可使用时长,直至多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,如此所确定出的各使用量波动值区间对应的第二平滑系数能够更加符合各使用量波动值区间的实际需求。
其中,在基于预设步长,至少调整一个使用量波动值区间对应的第一平滑系数时,可以通过两种方式进行调整,即,第一种调整方式为:通过基于预设步长,调整任一使用量 波动值区间对应的第一平滑系数,比如只调整第一使用量波动值区间对应的第一平滑系数,其它使用量波动值区间对应的第一平滑系数不作调整,通过根据调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数,返回执行确定磁盘在多个连续时段的可使用时长的流程。倘若在将第一使用量波动值区间对应的第一平滑系数调整至等于或略大于预设步长之前,能够存在一个调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,可以使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,则可以将该存在的一个调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数作为最终的各使用量波动值区间对应的平滑系数。倘若在将第一使用量波动值区间对应的第一平滑系数调整至等于或略大于预设步长时,也没有出现一个调整后的第一使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第一使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,可以使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,则需要开始进行调整第二使用量波动值区间对应的第一平滑系数,同时将调整后的第一使用量波动值区间对应的第一平滑系数恢复为调整前一开始设置的数值,其它使用量波动值区间对应的第一平滑系数不作调整。然后,通过根据调整后的第二使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长的流程,从而确定在将第二使用量波动值区间对应的第一平滑系数调整至等于或略大于第一使用量波动值区间对应的第一平滑系数之前,是否存在一个调整后的第二使用量波动值区间对应的第一平滑系数以及各使用量波动值区间中除该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,可以使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值。如此,在按照上述调整流程进行调整的过程中,如果能够存在基于一个调整后的使用量波动值区间对应的第一平滑系数以及未做调整的其它使用量波动值区间对应的第一平滑系数能够满足上述停止执行确定磁盘在多个连续时段的可使用时长的条件,则可以将该存在的一个调整后的第一使用量波动值区间对应的第一平滑系数以及未做调整的其它使用量波动值区间对应的第一平滑系数作为最终的各使用量波动值区间对应的平滑系数。如果不存在基于一个调整后的使用量波动值区间对应的第一平滑系数以及未做调整的其它使用量波动值区间对应的第一平滑系数能够满足上述停止执行确定磁盘在多个连续时段的可使用时长的条件,则可以按照第二种调整方式进行调整。
第二种调整方式为:基于预设步长,调整各使用量波动值区间中至少两个使用量波动值区间对应的第一平滑系数,比如调整第一使用量波动值区间对应的第一平滑系数以及调整第二使用量波动值区间对应的第一平滑系数,也即是,将第一使用量波动值区间对应的第一平滑系数每次减去一个预设步长,直至减到等于或略大于预设步长为止,如此可以得到多个数值,该第一使用量波动值区间对应的第一平滑系数以及该多个数值作为第一使用量波动值区间对应的多个平滑系数取值,比如第一使用量波动值区间对应的第一平滑系数 α 1减去预设步长l,即α 1-l得到一个数值q,用这个数值q减去预设步长l,得到一个数值h,再用这个数值h减去预设步长l得到一个数值d,以此类推,直至减到等于或略大于预设步长l为止,如此可以得到q、h、d等多个数值,将第一使用量波动值区间对应的第一平滑系数α 1以及q、h、d等多个数值作为第一使用量波动值区间对应的多个平滑系数取值。同时,将第二使用量波动值区间对应的第一平滑系数每次减去一个预设步长,直至减到等于或略大于第一使用量波动值区间对应的第一平滑系数为止,如此可以得到多个数值,该第二使用量波动值区间对应的第一平滑系数以及该多个数值作为第二使用量波动值区间对应的多个平滑系数取值,比如第二使用量波动值区间对应的第一平滑系数α 2减去预设步长l,即α 2-l得到一个数值f,用这个数值f减去预设步长l,得到一个数值w,再用这个数值w减去预设步长l得到一个数值r,以此类推,直至等于或略大于第一使用量波动值区间对应的第一平滑系数α 1为止,如此可以得到f、w、r等多个数值,将第二使用量波动值区间对应的第一平滑系数α 2以及f、w、r等多个数值作为第二使用量波动值区间对应的多个平滑系数取值。而且除第一使用量波动值区间以及第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数不作调整,通过根据调整后的第一使用量波动值区间对应的第一平滑系数(从第一使用量波动值区间对应的多个平滑系数取值中选择任一平滑系数取值)与调整后的第二使用量波动值区间对应的第一平滑系数(从第二使用量波动值区间对应的多个平滑系数取值中选择任一平滑系数取值)进行组合,也即是,将第一使用量波动值区间对应的多个平滑系数取值与第二使用量波动值区间对应的多个平滑系数取值进行两两组合(即第一使用量波动值区间对应的任意一个平滑系数取值与第二使用量波动值区间对应的任意一个平滑系数取值作为一个组合),如此可以得到多个组合数值,再根据任一组合数值以及各使用量波动值区间中除该第一使用量波动值区间和该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数,返回执行确定磁盘在多个连续时段的可使用时长的流程。倘若存在一个组合数值以及除该第一使用量波动值区间和该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数在执行确定磁盘在多个连续时段的可使用时长的过程中,可以使得多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于第二设定阈值,则可以将该组合数值以及除该第一使用量波动值区间和该第二使用量波动值区间之外的其它使用量波动值区间对应的第一平滑系数作为最终的各使用量波动值区间对应的平滑系数。其中,需要说明的是,上述组合方式也可以采用三个使用量波动值区间各自对应的多个平滑系数取值进行三三组合(即三个使用量波动值区间各自对应的一个平滑系数取值作为一个组合),然后按照上述的处理方式进行处理,在此不再赘述。
需要说明的是,可以先按照第一种调整方式进行调整,倘若在对各使用量波动值区间对应的第一平滑系数进行调整后,没有出现一个调整后的使用量波动值区间对应的第一平滑系数能够满足上述停止执行确定磁盘在多个连续时段的可使用时长的条件,则可以按照第二种调整方式进行调整。或者,也可以先按照第二种调整方式进行调整,倘若不存在一个组合数值能够满足上述停止执行确定磁盘在多个连续时段的可使用时长的条件,则可以按照第一种调整方式进行调整。
示例性地,可以基于本领域的平滑系数取值的经验值,可以为每个使用量波动值区间设置一个对应的初始平滑系数α,并设置一个用于调整平滑系数的预设步长l,比如l=0.01。其中,预设步长可以根据应用场景或本领域技术人员的经验进行设置,而且也可以在测算 平滑系数的过程中进行动态调整,本发明实施例对此并不作限定。再基于历史磁盘的磁盘使用量数据进行测算α。磁盘在多个连续时段中任意相邻时段的可使用时长的时长波动阈值(即第二设定阈值)可以根据应用场景或本领域技术人员的经验进行设置,而且也可以根据实际变化情况进行动态调整,比如,设置第T时段与第T+1时段允许的可使用时长变化率(即时长波动阈值)C=20%。然后,基于磁盘在多个历史时段的磁盘使用量执行确定磁盘在某一时段(比如第T时段)的可使用时长的流程,在执行确定磁盘在第T时段的可使用时长的流程中若涉及到不同的使用量波动值区间,能够按照不同的使用量波动值区间对应的平滑系数用于各自对应的平滑处理,从而可确定出磁盘在第T时段的可使用时长,按照该方式,可以确定出磁盘在位于该第T时段之后的多个连续时段(比如第T+1时段、第T+2时段、第T+3时段等)的可使用时长。比如,作为一种示例,磁盘在多个连续时段(比如第T时段、第T+1时段、第T+2时段、第T+3时段等)的可使用时长以及任意相邻时段的时长波动值可以如表1所示。
表1
日期 T T+1 T+2 T+29
可使用时长 100 70 60 20
时长波动值   33.3% 14.3%  
基于表1可知,磁盘在第T+1时段的时长波动值33.3%大于允许的可使用时长变化率C=20%,则可以按照上述第一种调整方式和/或第二种调整方式调整至少一个使用量波动值区间对应的平滑系数,然后,根据重新确定的磁盘在多个连续时段的可使用时长更新表1,直至表1中的各时长波动值均小于等于允许的可使用时长变化率C=20%,就可以停止更新运算,并保存最新的各使用量波动值区间对应的平滑系数到平滑系数记录中。
作为一种示例,平滑系数记录的格式可以如表2所示。
表2
使用量波动值区间 平滑系数
小于30% 0.05
大于等于30%且小于60% 0.1
大于等于60% 0.2
基于表2可知,假设磁盘在某一预测时段(比如第i时段)的第一磁盘使用量与磁盘在第i-1时段的第三磁盘使用量的使用量波动变化率为20%,则可知该20%<30%,则对磁盘在第i时段的第一磁盘使用量进行平滑处理所采用的平滑系数是0.05;如果磁盘在第i时段的第一磁盘使用量与磁盘在第i-1时段的第三磁盘使用量的使用量波动变化率为40%,则可知30%<40%<60%,则对磁盘在第i时段的第一磁盘使用量进行平滑处理所采用的平滑系数是0.1;如果磁盘在第i时段的第一磁盘使用量与磁盘在第i-1时段的第三磁盘使用量的使用量波动变化率为70%,则可知70%>60%,则对磁盘在第i时段的第一磁盘使用量进行平滑处理所采用的平滑系数是0.2。
需要说明的是,表2仅是一种示例性的简单说明,其所列举的各使用量波动值区间及其对应的第二平滑系数仅是为了便于说明方案,并不构成对方案的限定。而且,各使用量波动值区间及其对应的第二平滑系数是可以根据实际应用场景进行重新确定的,也即是说,各使用量波动值区间及其对应的第二平滑系数是可以动态变化的。
步骤204,若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的 预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
本发明实施例中,如果待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,并返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,此时相当于执行获取待预测磁盘在第i+1时段之前的滑动窗口内各时段的磁盘使用量,以此可循环确定出待预测磁盘在每个预测时段的预测磁盘使用量,从而直至在返回执行第j次后,述待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值小于等于第一设定阈值,那么就可以退出循环,以此即可动态准确地计算出待预测磁盘在第i-1时段的可使用时长为j个时段。第一设定阈值可以根据应用场景或本领域技术人员的经验进行设置,比如设置第一设定阈值为0),本发明实施例对此并不作限定。下面通过一个示例性的处理流程对确定待预测磁盘在第i-1时段的可使用时长进行描述,即:
int i=0;#i为磁盘在当前时段的可使用天数;用最小二乘法根据过去30天的磁盘使用量数据Y=[y 1,y 2,…,y 30]拟合出磁盘使用量预测函数F(x)
while(D>0)#D为磁盘在当前时段的磁盘剩余容量
Figure PCTCN2022100508-appb-000004
当在某一轮循环过程中磁盘剩余容量小于或等于0时,退出循环,输出磁盘在当前时段的可使用天数i。
上述实施例表明,由于现有技术方案中是采用去掉一个最小值、去掉一个最大值以及同时去掉一个最大值和最小值的方式进行确定磁盘的可使用时长,因此该方案针对30天内的磁盘使用量出现一个或两个数据噪点时所确定出的磁盘的可使用时长较为准确,但是倘若30天内的磁盘使用量中出现三个或三个以上数据噪点时就无法准确地确定出磁盘的可使用时长。基于此,本发明中的技术方案通过根据待预测磁盘在预测时段(即第i时段)之前的滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数,并通过磁盘使用量预测函数,确定待预测磁盘在预测时段的预测磁盘使用量。在待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值大于第一设定阈值时,将第i+1时段 作为预测时段,并返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,此时相当于执行获取待预测磁盘在第i+1时段之前的滑动窗口内各时段的磁盘使用量,以此可循环确定出待预测磁盘在每个预测时段的预测磁盘使用量,从而直至在返回执行第j次后,待预测磁盘在预测时段之前的磁盘剩余容量与预测时段的预测磁盘使用量的差值小于等于第一设定阈值,那么就可以退出循环,以此即可动态准确地计算出待预测磁盘在第i-1时段的可使用时长为j个时段。如此,该方案通过不断循环地重新构造磁盘使用量预测函数,也就可以使得每一轮循环构造出的磁盘使用量预测函数更加贴合实际,更加符合这一滑动窗口内各时段的磁盘使用量所反映的真实情况,从而可以更真实准确地确定出待预测磁盘在第i-1时段的可使用时长,进而可以有效地提高确定磁盘的可使用时长的准确率。同时,由于该方案通过循环执行确定待预测磁盘在每个预测时段的预测磁盘使用量,即可自动计算出待预测磁盘在第i-1时段的可使用时长,因此可以避免人工过多的介入,并有助于减少依靠人工确定待预测磁盘的可使用时长所耗费的时间和人力,从而可以提高待预测磁盘在第i-1时段的可使用时长的计算效率。
基于相同的技术构思,图3示例性的示出了本发明实施例提供的一种确定磁盘可使用时长的装置,该装置可以执行确定磁盘可使用时长的方法的流程。
如图3所示,该装置包括:
获取单元301,用于获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量;所述预测时段为第i时段;
处理单元302,用于根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数;通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量;若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
可选地,所述处理单元302具体用于:
通过最小二乘法对所述滑动窗口内各时段的磁盘使用量进行m种预测函数构造模式;
针对每种预测函数构造模式,生成所述预测函数构造模式对应的n个函数参数组;基于所述n个函数参数组,依次执行所述预测函数构造模式,确定出所述n个函数参数组各自对应的损失函数值,并将所述n个函数参数组各自对应的损失函数值进行比对,确定出最小损失函数值,并确定出所述最小损失函数值对应的函数参数组;
将所述m种预测函数构造模式各自对应的最小损失函数值进行比对,确定出最小的最小损失函数值,并将所述最小的最小损失函数值对应的函数参数组作为目标函数参数组,从而构造出所述磁盘使用量预测函数。
可选地,所述处理单元302具体用于:
针对所述n个函数参数组中每个函数参数组,构造所述函数参数组对应的函数;
将所述滑动窗口内各时段输入到所述函数参数组对应的函数,确定出所述各时段对应的预测磁盘使用量;
通过所述各时段对应的预测磁盘使用量与所述各时段对应的真实磁盘使用量,确定出 所述函数参数组对应的损失函数值。
可选地,所述处理单元302具体用于:
通过所述磁盘使用量预测函数,确定出所述待预测磁盘在所述第i时段的第一磁盘使用量;
对所述第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,并将所述第二磁盘使用量确定为所述待预测磁盘在所述预测时段的预测磁盘使用量。
可选地,所述处理单元302具体用于:
通过所述磁盘使用量预测函数,确定出所述待预测磁盘在第i-1时段的第三磁盘使用量;
确定所述第一磁盘使用量与所述第三磁盘使用量的使用量波动值;
确定所述使用量波动值在平滑系数记录中所对应的使用量波动值区间,并确定出所述使用量波动值区间对应的平滑系数;
基于所述平滑系数以及所述第一磁盘使用量,确定出所述第二磁盘使用量。
可选地,所述处理单元302具体用于:
通过下述方式确定所述平滑系数记录:
设置所述平滑系数记录中各使用量波动值区间对应的第一平滑系数以及用于调整所述第一平滑系数的预设步长;
基于磁盘在多个历史时段的磁盘使用量以及至少一个使用量波动值区间对应的第一平滑系数,确定所述磁盘在多个连续时段的可使用时长;
确定所述多个连续时段中任意相邻时段的可使用时长的时长波动值;
若存在至少一个时长波动值大于第二设定阈值,则基于所述预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长,直至所述多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于所述第二设定阈值,从而确定出所述各使用量波动值区间对应的第二平滑系数;
将所述各使用量波动值区间对应的第二平滑系数存储至所述平滑系数记录中。
可选地,所述处理单元302具体用于:
基于所述预设步长,调整所述各使用量波动值区间中任一使用量波动值区间对应的第一平滑系数,并基于所述调整后的任一使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长;或者,
基于所述预设步长,调整所述各使用量波动值区间中至少两个使用量波动值区间对应的第一平滑系数,并基于所述调整后的至少两个使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述至少两个使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长。
基于相同的技术构思,本发明实施例还提供了一种计算设备,如图4所示,包括至少一个处理器401,以及与至少一个处理器连接的存储器402,本发明实施例中不限定处理器401与存储器402之间的具体连接介质,图4中处理器401和存储器402之间通过总线连接为例。总线可以分为地址总线、数据总线、控制总线等。
在本发明实施例中,存储器402存储有可被至少一个处理器401执行的指令,至少一个处理器401通过执行存储器402存储的指令,可以执行前述的确定磁盘可使用时长的方法中所包括的步骤。
其中,处理器401是计算设备的控制中心,可以利用各种接口和线路连接计算设备的各个部分,通过运行或执行存储在存储器402内的指令以及调用存储在存储器402内的数据,从而实现数据处理。可选的,处理器401可包括一个或多个处理单元,处理器401可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理下发指令。可以理解的是,上述调制解调处理器也可以不集成到处理器401中。在一些实施例中,处理器401和存储器402可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器401可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合确定磁盘可使用时长的方法实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器402作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器402可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器402是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本发明实施例中的存储器402还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
基于相同的技术构思,本发明实施例还提供了一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述确定磁盘可使用时长的方法的步骤。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (10)

  1. 一种确定磁盘可使用时长的方法,其特征在于,包括:
    获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量;所述预测时段为第i时段;
    根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数;
    通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量;
    若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数,包括:
    通过最小二乘法对所述滑动窗口内各时段的磁盘使用量进行m种预测函数构造模式;
    针对每种预测函数构造模式,生成所述预测函数构造模式对应的n个函数参数组;基于所述n个函数参数组,依次执行所述预测函数构造模式,确定出所述n个函数参数组各自对应的损失函数值,并将所述n个函数参数组各自对应的损失函数值进行比对,确定出最小损失函数值,并确定出所述最小损失函数值对应的函数参数组;
    将所述m种预测函数构造模式各自对应的最小损失函数值进行比对,确定出最小的最小损失函数值,并将所述最小的最小损失函数值对应的函数参数组作为目标函数参数组,从而构造出所述磁盘使用量预测函数。
  3. 如权利要求2所述的方法,其特征在于,所述基于所述n个函数参数组,依次执行所述预测函数构造模式,确定出所述n个函数参数组各自对应的损失函数值,包括:
    针对所述n个函数参数组中每个函数参数组,构造所述函数参数组对应的函数;
    将所述滑动窗口内各时段输入到所述函数参数组对应的函数,确定出所述各时段对应的预测磁盘使用量;
    通过所述各时段对应的预测磁盘使用量与所述各时段对应的真实磁盘使用量,确定出所述函数参数组对应的损失函数值。
  4. 如权利要求1至3任一项所述的方法,其特征在于,所述通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量,包括:
    通过所述磁盘使用量预测函数,确定出所述待预测磁盘在所述第i时段的第一磁盘使用量;
    对所述第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,并将所述第二磁盘使用量确定为所述待预测磁盘在所述预测时段的预测磁盘使用量。
  5. 如权利要求4所述的方法,其特征在于,所述对所述第一磁盘使用量进行平滑处理,确定出第二磁盘使用量,包括:
    通过所述磁盘使用量预测函数,确定出所述待预测磁盘在第i-1时段的第三磁盘使用 量;
    确定所述第一磁盘使用量与所述第三磁盘使用量的使用量波动值;
    确定所述使用量波动值在平滑系数记录中所对应的使用量波动值区间,并确定出所述使用量波动值区间对应的平滑系数;
    基于所述平滑系数以及所述第一磁盘使用量,确定出所述第二磁盘使用量。
  6. 如权利要求5所述的方法,其特征在于,通过下述方式确定所述平滑系数记录:
    设置所述平滑系数记录中各使用量波动值区间对应的第一平滑系数以及用于调整所述第一平滑系数的预设步长;
    基于磁盘在多个历史时段的磁盘使用量以及至少一个使用量波动值区间对应的第一平滑系数,确定所述磁盘在多个连续时段的可使用时长;
    确定所述多个连续时段中任意相邻时段的可使用时长的时长波动值;
    若存在至少一个时长波动值大于第二设定阈值,则基于所述预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长,直至所述多个连续时段中任意相邻时段的可使用时长的时长波动值均小于等于所述第二设定阈值,从而确定出所述各使用量波动值区间对应的第二平滑系数;
    将所述各使用量波动值区间对应的第二平滑系数存储至所述平滑系数记录中。
  7. 如权利要求6所述的方法,其特征在于,所述基于所述预设步长,至少调整一个使用量波动值区间对应的第一平滑系数,并基于调整后的至少一个使用量波动值区间对应的第一平滑系数以及未调整的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长,包括:
    基于所述预设步长,调整所述各使用量波动值区间中任一使用量波动值区间对应的第一平滑系数,并基于所述调整后的任一使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长;或者,
    基于所述预设步长,调整所述各使用量波动值区间中至少两个使用量波动值区间对应的第一平滑系数,并基于所述调整后的至少两个使用量波动值区间对应的第一平滑系数以及所述各使用量波动值区间中除所述至少两个使用量波动值区间以外的其它使用量波动值区间对应的第一平滑系数,返回执行确定所述磁盘在多个连续时段的可使用时长。
  8. 一种确定磁盘可使用时长的装置,其特征在于,包括:
    获取单元,用于获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量;所述预测时段为第i时段;
    处理单元,用于根据所述滑动窗口内各时段的磁盘使用量,构造出磁盘使用量预测函数;通过所述磁盘使用量预测函数,确定所述待预测磁盘在所述预测时段的预测磁盘使用量;若所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值大于第一设定阈值,则将第i+1时段作为预测时段,返回执行获取待预测磁盘在预测时段之前的滑动窗口内各时段的磁盘使用量,直至返回执行第j次后,所述待预测磁盘在所述预测时段之前的磁盘剩余容量与所述预测时段的预测磁盘使用量的差值小于等于所述第一设定阈值,从而确定出所述待预测磁盘在第i-1时段的可使用时长为j个时段。
  9. 一种计算设备,其特征在于,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1至7任一权利要求所述的方法。
  10. 一种计算机可读存储介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行权利要求1至7任一权利要求所述的方法。
PCT/CN2022/100508 2021-09-02 2022-06-22 一种确定磁盘可使用时长的方法及装置 WO2023029680A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111027635.7 2021-09-02
CN202111027635.7A CN113835626B (zh) 2021-09-02 2021-09-02 一种确定磁盘可使用时长的方法及装置

Publications (1)

Publication Number Publication Date
WO2023029680A1 true WO2023029680A1 (zh) 2023-03-09

Family

ID=78962086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100508 WO2023029680A1 (zh) 2021-09-02 2022-06-22 一种确定磁盘可使用时长的方法及装置

Country Status (2)

Country Link
CN (1) CN113835626B (zh)
WO (1) WO2023029680A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993087A (zh) * 2023-07-28 2023-11-03 中国电建集团华东勘测设计研究院有限公司 基于多层次嵌套动态规划多目标模型的水库优化调度方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835626B (zh) * 2021-09-02 2024-04-05 深圳前海微众银行股份有限公司 一种确定磁盘可使用时长的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469107A (zh) * 2016-08-31 2017-03-01 浪潮(北京)电子信息产业有限公司 一种存储资源的容量预测方法及装置
CN107480028A (zh) * 2017-07-21 2017-12-15 东软集团股份有限公司 磁盘可使用的剩余时长的获取方法及装置
CN111898826A (zh) * 2020-07-31 2020-11-06 北京文思海辉金信软件有限公司 资源消耗预测方法、装置、电子设备及可读存储设备
US20200371896A1 (en) * 2019-05-22 2020-11-26 Vmware, Inc. Exponential decay real-time capacity planning
CN113835626A (zh) * 2021-09-02 2021-12-24 深圳前海微众银行股份有限公司 一种确定磁盘可使用时长的方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105094708B (zh) * 2015-08-25 2018-06-12 北京百度网讯科技有限公司 一种磁盘容量的预测方法及装置
JP6969990B2 (ja) * 2017-11-28 2021-11-24 株式会社東芝 情報処理装置、情報処理方法及びコンピュータプログラム
US11204811B2 (en) * 2018-04-12 2021-12-21 Vmware, Inc. Methods and systems for estimating time remaining and right sizing usable capacities of resources of a distributed computing system
CN113033906A (zh) * 2021-04-07 2021-06-25 山东润一智能科技有限公司 基于三参数指数平滑的能耗预测方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469107A (zh) * 2016-08-31 2017-03-01 浪潮(北京)电子信息产业有限公司 一种存储资源的容量预测方法及装置
CN107480028A (zh) * 2017-07-21 2017-12-15 东软集团股份有限公司 磁盘可使用的剩余时长的获取方法及装置
US20200371896A1 (en) * 2019-05-22 2020-11-26 Vmware, Inc. Exponential decay real-time capacity planning
CN111898826A (zh) * 2020-07-31 2020-11-06 北京文思海辉金信软件有限公司 资源消耗预测方法、装置、电子设备及可读存储设备
CN113835626A (zh) * 2021-09-02 2021-12-24 深圳前海微众银行股份有限公司 一种确定磁盘可使用时长的方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993087A (zh) * 2023-07-28 2023-11-03 中国电建集团华东勘测设计研究院有限公司 基于多层次嵌套动态规划多目标模型的水库优化调度方法

Also Published As

Publication number Publication date
CN113835626A (zh) 2021-12-24
CN113835626B (zh) 2024-04-05

Similar Documents

Publication Publication Date Title
WO2023029680A1 (zh) 一种确定磁盘可使用时长的方法及装置
CN109697522B (zh) 一种数据预测的方法和装置
CN104516475B (zh) 用于管理多核片上系统上的全局芯片功率的方法和装置
CN110289994B (zh) 一种集群容量调整方法及装置
CN115134368B (zh) 一种负载均衡方法、装置、设备以及存储介质
CN111813523A (zh) 时长预估模型生成方法、系统资源调度方法、装置、电子设备和存储介质
CN103778474A (zh) 资源负载量预测方法、分析预测系统及业务运营监控系统
CN114500339B (zh) 一种节点带宽监测方法、装置、电子设备及存储介质
US20190138354A1 (en) Method for scheduling jobs with idle resources
CN115269108A (zh) 一种数据处理方法、装置及设备
CN114265679A (zh) 数据处理方法、装置和服务器
CN112506619A (zh) 作业处理方法、装置、电子设备、存储介质和程序产品
CN109213965B (zh) 一种系统容量预测方法、计算机可读存储介质及终端设备
CN112187870B (zh) 一种带宽平滑方法及装置
CN113114540B (zh) 一种带宽预测器的设置、服务调整方法及相关装置
CN111813524B (zh) 一种任务执行方法、装置、电子设备和存储介质
CN113435632A (zh) 信息生成方法、装置、电子设备和计算机可读介质
CN114520773B (zh) 一种服务请求的响应方法、装置、服务器及存储介质
CN112181498A (zh) 并发控制方法、装置和设备
WO2023093567A1 (zh) 物品库存的控制方法、装置、设备及介质
WO2023087705A1 (zh) 一种资源预测方法及装置
CN114327918B (zh) 调整资源量的方法、装置、电子设备和存储介质
CN111583010A (zh) 一种数据处理方法、装置、设备及存储介质
CN111666535B (zh) 确定用户活跃时长的方法、装置、电子设备及存储介质
CN115842822A (zh) 一种低碳自适应云主机资源配置系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE