CN113515238A

CN113515238A - Data scheduling method and system based on hierarchical storage and electronic equipment

Info

Publication number: CN113515238A
Application number: CN202110847220.8A
Authority: CN
Inventors: 陈铎
Original assignee: Huayun Data Holding Group Co Ltd
Current assignee: Huayun Data Holding Group Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-19
Anticipated expiration: 2041-07-27
Also published as: CN113515238B

Abstract

The invention provides a data scheduling method, a system and electronic equipment based on layered storage, wherein the data scheduling method comprises the following steps: acquiring historical workload data stored in a layered mode, and determining a workload period based on a power spectral density function corresponding to the historical workload data; dynamically setting a working load mode matched with the working load period for hierarchical storage in a preset period according to the working load period; and setting a migration priority for the data to be scheduled according to the working load mode in a preset period, and scheduling the data to be scheduled to at least one storage medium in the memory groups with different read/write performances forming the layered storage according to the migration priority. The invention realizes the effect of better guiding the data to be scheduled to reasonably migrate to at least one storage medium in the memory groups with different read/write performances of the hierarchical storage according to the performance of the hierarchical storage, and fully exerts the storage performance of the hierarchical storage.

Description

Data scheduling method and system based on hierarchical storage and electronic equipment

Technical Field

The invention relates to the technical field of data storage, in particular to a data scheduling method and system based on hierarchical storage and electronic equipment.

Background

Tiered Storage (Tiered Storage) is a technique that allows data to be migrated between different Storage tiers, which saves more cost while providing the required performance. Of all data, the data with higher access frequency is stored in the high-performance storage layer, most other data is stored in the low-performance storage layer with large capacity and low price, and the user does not need to know where the data is stored, and the system can automatically retrieve the data. The latest development of hierarchical storage is a combination of mechanical hard disks and flash memories in view of the physical structure of the storage medium. Hierarchical storage in a storage system is to store hot spot data with higher access frequency in a storage layer with high read/write performance, such as an SSD storage layer, and store cold data with lower access frequency in a storage layer with low read/write performance, such as an HDD storage layer.

Referring to fig. 1 and 2, a system, software or a system containing software based on a hierarchical storage component collects and learns a workload (workload) generally at a fixed period. EMC FASTVP uses fixed-cycle statistics that do not take into account workload by periodically relocating the most active data to the topmost storage; data progress of Dell runs once a day at a fixed time, and does not consider the workload; IBM's storage products use a way to determine if the section has sufficient load pressure and is suitable for migrating hot data by collecting ema (external Moving average) values over a number of fixed periods. EMA is an infinite impulse response filter that uses exponentially decaying weighting factors.

The applicant indicates that the above-mentioned prior art, which characterizes storage overhead and status of hierarchical storage based on workload, cannot timely and accurately reflect a real workload peak formed by hierarchical storage on data to be scheduled due to a "long tail effect", so that storage capacity of a storage device using hierarchical storage cannot be effectively utilized, and overall performance and throughput of the storage device using hierarchical storage and a computer system IOPS including the storage device are reduced. In addition, the technical means of collecting and learning workload (workload) at a fixed period can cause that the data read-write capability required based on a specific access request cannot be met in a sudden high-load scene, so that the investment of deploying layered storage is wasted, and the problems of overlarge investment cost and investment waste are caused by overlarge (or stronger performance) or undersize (or weaker performance) layered storage.

In view of the above, there is a need to improve the data scheduling method based on hierarchical storage in the prior art to solve the above problems.

Disclosure of Invention

The invention aims to disclose a data scheduling method, a data scheduling system and electronic equipment based on hierarchical storage, which are used for solving the defects existing in the data scheduling process based on the hierarchical storage in the prior art, in particular for reducing the deployment and maintenance cost of the hierarchical storage, reducing the read/write cost of data to be scheduled in the hierarchical storage, and fully exerting the advantages of storage devices with different read/write performances forming the hierarchical storage in the data scheduling process, so as to improve the scheduling effect of the data to be scheduled and reasonably utilize the storage space formed by storage media with different read/write performances in the hierarchical storage.

In order to achieve one of the above objects, the present invention provides a data scheduling method based on hierarchical storage, including:

s1, collecting historical workload data stored in a layered mode, and determining a workload period based on a power spectral density function corresponding to the historical workload data;

s2, dynamically setting a work load mode matched with the work load cycle in a preset cycle for hierarchical storage according to the work load cycle;

s3, setting a migration priority for the data to be scheduled according to the workload mode in a preset period, and scheduling the data to be scheduled to at least one storage medium in the memory groups with different read/write performances forming layered storage according to the migration priority.

As a further improvement of the present invention, the step S1 specifically includes the following steps:

s11, collecting hierarchically stored historical workload data and executing smoothing processing;

s12, performing autocorrelation calculation on the smoothed historical workload data;

s13, calculating the historical workload data after the autocorrelation calculation by using fast Fourier transform to obtain a power spectral density function corresponding to the historical workload data;

s14, selecting and calculating the least common multiple of the significant frequency points included in the power spectral density function to determine the workload period of the hierarchical storage to the historical workload data.

As a further improvement of the present invention, the step S2 specifically includes the following steps:

s21, pre-judging data to be scheduled in layered storage processing;

s22, issuing a strategy for configuring a plurality of QoS thresholds to the hierarchical storage, dividing the work load cycle into at least two cycle segments with different work loads through the QoS thresholds, and outputting the work load mode which is dynamically set for the hierarchical storage and is matched with the cycle segments in the preset cycle.

As a further improvement of the present invention, after the step S22 outputs the workload pattern matched with the period segment for the hierarchical storage dynamic setting in the preset period, the method further includes: each cycle segment is associated with a memory group of different read/write performance that makes up the hierarchical storage.

As a further improvement of the present invention, after the step S3 sets the migration priority for the data to be scheduled according to the workload mode in a preset period, the method further includes: generating at least one operation list and visually displaying the operation list, wherein the operation list is associated with the migration priority of the data to be scheduled;

the operation list comprises one or more of a hot data list, a warm data list and a cold data list.

As a further improvement of the present invention, the memory group includes: a first memory group having a high read/write performance, a second memory group having a medium read/write performance, and a third memory group having a low read/write performance;

the first storage group comprises one or more of a register, a cache or a main memory, the second storage group comprises a plurality of SLC solid state hard disks, and the third storage group comprises one or more of an MLC solid state hard disk, a TLC solid state hard disk, a mechanical hard disk, a tape storage, RAID 0-RAID 6 or a compact disc storage.

As a further improvement of the present invention, after the step S3 generates the operation list, the method further includes: and detecting the current workloads of the first memory group, the second memory group and the third memory group which form the hierarchical storage, so as to schedule the data to be scheduled corresponding to the migration priority into the first memory group, the second memory group and the third memory group which form the hierarchical storage according to the sequence of the migration priority from high to low.

As a further improvement of the present invention, after the step S3 generates the operation list, the method further includes: and detecting the current workload of the single storage media respectively contained in the first storage group, the second storage group and the third storage group so as to schedule the data to be scheduled with the same migration priority to one or more storage media with lighter relative workload in the same storage group.

As a further improvement of the present invention, the step S3 further includes: the method comprises the steps of dividing data to be scheduled into cold data, warm data and hot data, preferentially executing the operation of scheduling the hot data to a first memory group, generating an optimized time period of the scheduling operation of the hot data after all the hot data are scheduled, binding the optimized time period with a migration priority, and excluding the migration operation of the warm data and the cold data in the optimized time period in a preset period.

Based on the same invention idea, the invention also discloses a data scheduling system based on hierarchical storage, which comprises:

the system comprises a hierarchical storage controller, a storage cluster mounted to the hierarchical storage controller and a power spectral density function detection module;

the hierarchical storage controller deploys a workload calculation module;

the power spectral density function detection module acquires historical workload data stored in a layered mode and determines a workload period based on a power spectral density function corresponding to the historical workload data;

the workload calculation module dynamically sets a workload mode matched with a workload cycle for hierarchical storage in a preset cycle, and sets a migration priority for data to be scheduled in the preset cycle according to the workload mode, so that the data to be scheduled is scheduled to at least one storage medium in memory groups with different read/write performances forming the hierarchical storage according to the migration priority.

As a further improvement of the present invention, the hierarchical storage controller comprises: the system comprises a workload calculation module, an acquisition module, a workload mode request module and an optimization strategy forwarding module;

the acquisition module acquires historical workload data stored in a layered mode;

the work load mode request module is connected with the power spectral density function detection module, initiates a request for executing a work load cycle determination based on a power spectral density function corresponding to historical work load data, and receives a work load cycle generated by power spectral density function operation logic contained in the power spectral density function detection module;

the optimization strategy forwarding module calls a working load mode which is dynamically set for hierarchical storage in a preset period and is matched with the working load period from the working load calculation module and forwards the working load mode to the storage cluster.

As a further improvement of the present invention, the power spectral density function detection module comprises:

the smoothing operation module is used for executing smoothing processing on the historical workload data stored in a layered mode;

the fast Fourier transform module is used for performing autocorrelation calculation on the smoothed historical workload data and then obtaining a power spectral density function corresponding to the historical workload data by using fast Fourier transform calculation;

the frequency point selection module is used for selecting and calculating the least common multiple of a plurality of significant frequency points contained in the power spectral density function so as to determine the workload period of the hierarchical storage to the historical workload data;

the work load cycle adjusting module is used for prejudging data to be scheduled in the layered storage processing through power spectral density function operation logic, issuing a strategy for configuring a plurality of QoS thresholds to the layered storage, dividing the work load cycle into at least two cycle segments with different work loads through the QoS thresholds, and outputting a work load mode which is dynamically set for the layered storage in a preset cycle and is matched with the cycle segments to the work load mode request module.

As a further improvement of the present invention, the power spectral density function detection module is disposed in a logical operation device logically independent from the hierarchical storage controller, the logical operation device running a power spectral density function operation logic.

Finally, based on the same inventive concept, the present invention further discloses an electronic device, comprising:

processor, memory device comprising at least one memory unit, and

a communication bus establishing a communication connection between the processor and the storage device;

the processor is used for executing one or more programs stored in the storage device to realize the data scheduling method based on the hierarchical storage as disclosed in any invention creation above.

Compared with the prior art, the invention has the beneficial effects that:

in the invention, the power spectral density function detection module of the power spectral density function operation logic is operated to analyze the hierarchical storage and the historical data of the workload to obtain the period and the mode of the workload, the workload of the data to be scheduled is pre-judged according to the period and the mode to provide an accurate and advanced decision for reasonably setting a plurality of QoS thresholds, so that the data to be scheduled is better guided to be reasonably migrated to at least one storage medium in the memory groups with different read/write performances of the hierarchical storage according to the performance of the hierarchical storage, the storage performance of the hierarchical storage is fully exerted, and the hierarchical storage and the deployment cost of a system or software comprising the hierarchical storage are reduced.

Drawings

Fig. 1 is a schematic diagram of calculating an EMA value by collecting IO statistics stored hierarchically in a certain period in a fixed period in the prior art, where an abscissa is a time unit and an ordinate is an index of a workload;

FIG. 2 is a schematic diagram of the prior art shown in FIG. 1 showing a long tail effect occurring in a discrete high IOPS read-write scenario, where the abscissa is a time unit and the ordinate is an index of a workload;

FIG. 3 is a general flowchart of a data scheduling method based on hierarchical storage according to the present invention;

FIG. 4 is a schematic diagram of historical workload data, with the abscissa being units of time and the ordinate being an index of the workload;

FIG. 5 is a schematic diagram of the historical workload data shown in FIG. 4 after smoothing, with the abscissa being the unit of time and the ordinate being the index of the workload;

FIG. 6 is a schematic diagram of a Fast Fourier Transform (FFT) performed on the smoothed historical workload data of FIG. 5;

FIG. 7 is a schematic diagram of selecting and calculating the least common multiple of significant frequency points included in a power spectral density function to determine the workload cycle that a hierarchical store has for historical workload data, with the abscissa being the unit of time and the ordinate being the index of the workload;

FIG. 8 is a topology diagram including detection of workload periods and workload patterns of tiered storage devices using power spectral density functions;

fig. 9 is a schematic diagram of a workload pattern output by the power spectral density function detection module during a preset period for historical workload data, with the abscissa being a time unit and the ordinate being an index of the workload;

FIG. 10 is a topology diagram of an inventive data scheduling system based on hierarchical storage;

fig. 11 is a schematic diagram of issuing a policy for configuring a first QoS threshold to a storage cluster, where the abscissa is a time unit and the ordinate is an index of a workload;

fig. 12 is a schematic diagram of a workload cycle (corrected Pattern) divided into two cycle segments having different workload cycles (i.e., a high workload cycle and a low workload cycle) within a certain period by the first QoS threshold set in fig. 11, the abscissa is a time unit and the ordinate is an index of the workload;

FIG. 13 is a topology diagram of a computer device in one embodiment incorporating a hierarchical storage based data scheduling system of the present invention;

FIG. 14 is a topology diagram of a computer device incorporating a hierarchical storage based data scheduling system of the present invention in another embodiment;

FIG. 15 is a diagram of two operation lists (i.e., a cold data list 241 and a hot data list 243) output by the workload calculation module;

fig. 16 is a schematic diagram of allocating a first QoS threshold and a second QoS threshold to a storage cluster to divide a whole workload cycle (corrected Pattern) into three cycle segments with different workload cycles (i.e., a high workload cycle, a medium workload cycle, and a low workload cycle), where an abscissa is a time unit and an ordinate is an index of a workload;

FIG. 17 is a diagram of three operation lists (i.e., cold data list 241, warm data list 242, and hot data list 243) output by the workload calculation module;

FIG. 18 is a topology diagram of an electronic device of the present invention.

Detailed Description

The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.

Before describing in detail various embodiments of the present application, the meanings of the main technical terms involved are necessarily defined and explained.

Term "Thermal data”、“Temperature data'and'Cold dataThe data is divided according to the access frequency of the data, wherein the access frequency of the hot data is higher than that of the warm data, and the access frequency of the warm data is higher than that of the cold data. Typically, thermal data. Such data requires a higher level of hierarchical storage,as it is often used in CRM, ERP, and even e-mail applications, and needs to be used for daily operations of the enterprise. In such storage layers, performance is important, but cost is also a consideration.

Warm data, such as data that includes older data, such as e-mail for more than a few days or data that has completed a transaction. Such data is accessed relatively infrequently, but still guaranteed to be accessible when needed. In this storage layer, the most important consideration is cost, but is subject to the lowest performance threshold.

Cold data, such data may never be accessed anymore (or occasionally need to be accessed), but need to be archived and retained to comply with regulatory or other legal requirements, or simply because it may have some value at some indeterminate time in the future — perhaps for big data analysis. Ideally, cold data is suitable for lowest-level tiered storage with acceptable minute or hour access times, with low cost being the most important consideration.

Term "Data to be scheduled"may include data for which write (write) or read (read) operations are performed between various functional nodes (e.g., storage nodes, compute nodes, servers, etc.) deployed in a distributed system or data center. Data to be scheduled in various embodiments generally refers to data that needs to be accessed, migrated, or accessed from an external device, such as a computer or server or data center that is independent of the hierarchical storage (device).

Term "Communication channel"refers to a data transmission path, and the communication channel includes but is not limited to various physical buses, virtual buses, or various broadcast protocols and unicast protocols of data/messages.

The first embodiment is as follows:

hierarchical Storage (also known as "Hierarchical Storage Management"). Hierarchical storage is to store data in storage media of different hierarchies, and perform operations such as automatic or manual data migration and copying between different media. Hierarchical storage is also a specific application and implementation of information lifecycle management.

A data scheduling method based on tiered storage according to this embodiment is intended to disclose an operation of performing data scheduling on cold data and hot data based on tiered storage or a memory group having different read/write performance in the tiered storage apparatus 20(20A) shown in fig. 10 (or fig. 13 or fig. 14).

Referring to fig. 3, in the present embodiment, the data scheduling method based on hierarchical storage (hereinafter referred to as "data scheduling method") includes the following steps S1 to S3.

The data scheduling method operates in a data scheduling system based on hierarchical storage shown in fig. 10. The tiered storage device 20 includes a tiered storage controller 21 and a storage cluster 30 controlled by the tiered storage controller 21, where the storage cluster 30 includes one or more storage groups, and each storage group may include at least one storage medium.

In this embodiment, the storage cluster 30 includes: a first memory group 301 having high read/write performance, a second memory group 302 having medium read/write performance, and a third memory group 303 having low read/write performance. The first memory group 301 is comprised of one or more of registers, cache (e.g., Intel Itanium memory), or main memory (e.g., DMA direct memory or RAMDISK directly accessed by the CPU through a DMA controller), and thus the storage medium comprising the first memory group 301 is preferably a relatively small but very high performance storage medium. The second storage group 302 is composed of a plurality of SLC solid state disks, and the storage media composing the second storage group are preferably relatively large capacity and relatively high performance (performance is lower than that of the storage media of the first storage group 301). The third storage group 303 is composed of one or more of MLC solid state disk, TLC solid state disk, mechanical hard disk, tape storage, RAID 0-RAID 6 or optical disk storage, and the storage medium composing the third storage group is preferably a storage medium with relatively low capacity relative to maximum performance. Storage cluster 30 may include one or more storage clusters of varying capabilities and may be deployed with at least a first storage cluster 301 for storing thermal data. In the present application, the performance of the storage medium is divided into two or three memory groups according to the performance of the storage medium in each memory group, and the performance refers to one or more evaluation indexes of the read performance, write performance, access Time (TM) and main memory Bandwidth (BM) of the storage medium, and is preferably evaluated according to the access time. The access time refers to the total time required for a memory to perform a complete read and write operation, i.e., the minimum time interval required between two consecutive independent access memory operations (read or write operations).

The first memory cluster 301 is used to store thermal Data (Hot Data) to meet high performance storage requirements such as high speed and low latency required for Data frequently accessed by compute nodes, such as online-type Data. The second memory group 302 is used for storing Warm Data (Warm Data) to meet high-performance storage requirements of medium speed and medium delay required by Data with relatively low access frequency of the computing nodes (such as recently active applications, recently active Data and other behavior Data with certain timeliness, namely non-instantaneous state and behavior Data). The third storage group 303 is used for storing Cold Data (Cold Data) to meet the low-performance storage requirements of low speed and high latency required by Data that is accessed by the computing node occasionally (such as Data that is accessed infrequently in offline classes, such as enterprise backup Data, business and operation log Data, ticket and statistical Data). It should be noted that the hot data, the warm data, and the cold data are only one relative value, and are usually divided according to the frequency and the time delay degree of the data being accessed, and the three types of data are stored in the storage cluster 30 of the hierarchical storage (the hierarchical storage device 20) through storage media with different read/write performance, so as to meet the performance requirements of the user in the scheduling process for the three types of data (or two types of data) after initiating an access request. The hot data, the warm data, and the cold data are stored in a high-performance first memory group 301, a medium-performance second memory group 302, and a low-performance third memory group 303, respectively.

First, step S1 is executed to collect historical workload data stored hierarchically, and determine a workload cycle based on a power spectral density function corresponding to the historical workload data.

As shown in fig. 4 to 8, step S1 specifically includes the following steps S11 to S14.

Step S11 is executed by the smoothing operation module 101 in fig. 8, and collects hierarchically stored historical workload data and executes smoothing processing. Historical workload data, Raw IO statistics in fig. 8, is continuously obtained from storage clusters 30 of tiered storage 20 via collection module 22. The Raw workload data that forms over a period of time after the storage media of different capabilities in the storage cluster 30 are determined forms the state shown in fig. 4. The historical workload data (Raw workload data) forms an irregular glitch in fig. 4. The smoothened work load data in fig. 5 is formed after the smoothing process of step S11. The smoothing process filters out the glitches of the Raw workload data signal and is briefly illustrated by the following equation (1). A smoothing window N may be specified in the smoothing process to define a time-sequential based window size for the smoothing process.

X (i) _ after _ smooth ═ x (i-N) + x (i-N +1) + … + x (i + N) ]/(2N +1) formula (1);

in equation (1), the signal (i.e. the signal formed by the historical workload data) is averaged in a smooth window of 2N +1, where the parameter x is the signal, the average is taken in a window with a time length N around the time point i, and the parameter N is a specified length in the time series.

The historical workload data collected from the tiered storage devices 20 is continuously forwarded by the tiered storage controller 21 to the power spectral density function detection module 10. The hierarchical storage device 20 has an equivalent technical meaning to "hierarchical storage" in the embodiments of the present application.

Step S12 is executed by the smoothing operation module 101 in fig. 8, and performs autocorrelation calculation on the smoothed historical workload data. The autocorrelation calculation is obtained by the operation of an autocorrelation function well-established in the art. The autocorrelation function calculates an average measure of the characteristics of the signal in the time domain.

Step S13 is executed by the Fast Fourier Transform module 102 in fig. 8, and a power spectral density function corresponding to the historical workload data is calculated by using Fast Fourier Transform (FFT) on the historical workload data after the autocorrelation calculation. The Power Spectral Density Function (PSDF) is determined based on autocorrelation calculation and FFT calculation, and the FFT calculation results in the PSDF shown in fig. 6. Since the above process is prior art, it is not described herein in detail. The PSDF in fig. 6 is a frequency domain analysis with a period of 1400 minutes, while fig. 7 is a time domain analysis. The frequency domain analysis comprises a sine wave, the sine wave is a description of a frequency domain, the frequency domain analysis is to convert a signal into a frequency axis and represent the frequency axis as a coordinate, and the time domain analysis is to represent the relation of a dynamic signal by taking a time axis as a coordinate. Thus, the abscissa of the power spectral density function PSDF in fig. 6 records twice the time axis as the abscissa of the time domain signal in fig. 7.

Step S14 is performed by the frequency point selection module 103 in fig. 8, and selects and calculates the least common multiple (f) of the significant frequency points (f1, f2 … … fn) included in the power spectral density function to determine the workload cycle of the hierarchical storage on the historical workload data. The work load period T is 1/f. The vertical line between two adjacent peaks in fig. 7 is a normalized signal period (seven in total). Filtered workload represents the denoised historical workload data. The Filtered workload load is used to obtain the workload pattern output by the power spectral density function detection module 10 within a preset period (one period, 1400 minutes) for the historical workload data shown in fig. 9.

Upon identifying a workload period of a storage cluster 30 in the tiered storage 20 based on historical workload data, a workload Pattern (Pattern) may be determined within a determined detection period. The power spectral density function detection module 10 feeds back the adjusted workload period and workload pattern, i.e., the quantized workload period and the quantized workload pattern, to the hierarchical storage controller 21.

In particular, the workload pattern of the cold data and the hot data can be flexibly adjusted, so that the method is obviously different from the similar technical scheme in the prior art. Referring to fig. 10, the hierarchical storage controller 21 initiates a request for executing determining a Workload cycle based on a power spectral density function corresponding to historical Workload data to the power spectral density function detection module 10 through the Workload pattern request module 23, where the request may modify read/write conditions of various types of data according to storage media with different performances in the storage cluster 30, and initiates a modification request to the power spectral density function detection module 10 to dynamically adjust a Workload pattern (Workload pattern). The power spectral density function detection module 10 is disposed in a logic operation device 110 logically independent from the hierarchical storage controller 21, and the logic operation device 110 runs power spectral density function operation logic. The logical operation device 110 can be regarded as a data center, a physical (virtual) server, a cloud computing platform, a single chip, or any one of the prior art having hardware devices and/or software systems executing computer programs. The hierarchical storage controller 21 stores an array of entries, each entry corresponding to a row in the memory. Each entry has a Tag (Tag) and several flags (flag) indicating the state of the cache line. The tag is comprised of bits that enable hierarchical memory controller 21 to distinguish the memory cell currently mapped by the row. Therefore, in the present embodiment, by independently disposing the power spectral density function detection module 10 in the logic operation device 110, the design difficulty of the hierarchical storage controller 21 can be reduced and the reliability of the operability thereof can be improved.

Then, step S2 is executed to dynamically set a workload pattern matching the workload cycle for the hierarchical storage within a preset period according to the workload cycle. The step S2 specifically includes the following steps:

and step S21, pre-judging the data to be scheduled in the hierarchical storage processing. The anticipation operation is performed by the hierarchical storage controller 21 in fig. 10, and further may be performed for the workload calculation module 25.

Step S22, issue a policy configured (specifically, to the storage cluster 30 of the tiered storage device 20) with one QoS threshold (i.e., the first QoS threshold) to the tiered storage, divide the workload cycle into two cycle segments with different workloads by one QoS threshold, and output a workload pattern dynamically set for the tiered storage within a preset cycle and matched with the cycle segments. The hierarchical storage controller 21 in fig. 10 issues an optimization policy including a first QoS threshold to the storage cluster 30, so as to avoid writing hot data into the second storage group 302 and/or the third storage group 303 with relatively low read/write performance, or avoid writing cold data into the first storage group 301 with the highest read/write performance, by issuing the optimization policy and reasonably arranging and writing data to be scheduled into the corresponding storage group by using a workload pattern dynamically set in a preset period and matching with a workload period, so as to optimize reasonable scheduling of the hot data and the cold data (i.e., the aforementioned data to be scheduled). Meanwhile, the "scheduling" in this embodiment may be understood as data operations such as writing to one or more storage media in the storage cluster 30 or reading from one or more storage media in the storage cluster 30, or deleting or modifying.

Referring to fig. 8 and 9, after the correlated workload period and pattern output by the workload period adjusting module 104 in the power spectral density function detecting module 10 are obtained, a workload pattern (correlated pattern or correlated workload pattern) shown in fig. 9 is formed, where the workload pattern is a prediction result of data to be scheduled. Referring to fig. 11 and 12, after issuing the optimization policy including the first QoS threshold to the storage cluster 30, the workload pattern is divided into three period segments, i.e., two low workload periods and one high workload period in fig. 12. Scheduling operations for cold data are performed during low workload periods and scheduling operations for hot data are performed during high workload periods.

In step S22, after outputting the workload pattern dynamically set for the hierarchical storage in the preset period and matched with the period segment, the method further includes: each cycle segment is associated with a memory group (i.e., one or all of the first memory group 301, the second memory group 302, or the third memory group 303) that constitutes a different read/write performance of the hierarchical storage. A low workload period is associated with the third memory group 303 and a high workload period is associated with the first memory group 301.

And finally, executing the step S3, setting a migration priority for the data to be scheduled according to the workload mode in a preset period, so as to schedule the data to be scheduled to at least one storage medium in the memory groups with different read/write performances, which form the hierarchical storage, according to the migration priority. Preferably, as shown in fig. 15, in this embodiment, after the step S3 sets the migration priority for the data to be scheduled according to the workload mode in the preset period, the method further includes: and generating three operation lists and visually displaying the operation lists, wherein the operation lists are associated with the migration priority of the data to be scheduled. The operation list includes a cold data list 241 and a hot data list 243, and even only includes a warm data list 242.

The operation list is electrically connected to HOST60 through read/write controller 40 and a bus, and HOST60 can be configured as a virtual device (cloud HOST) or a physical device (physical computer with display) with a UI interface for presentation to the user. Referring to fig. 13 and 14, the hierarchical storage device 20 is deployed at the bottom logic level in the virtualization cluster 100, the bus of the virtualization cluster 100 mounts one or more processors 70, and a user performs a corresponding scheduling operation on data to be scheduled by locally logging in or remotely logging in the HOST 60.

All data to be scheduled (including cold data and hot data) are loaded with a distinctive workload pattern to match a time period suitable for the data to be scheduled to perform scheduling. When it is determined to store the cold data and the hot data waiting for scheduling to one or more storage media in the storage cluster 30, it is necessary to check each workload mode of the entire storage cluster 30 in a plurality of preset periods, reasonably allocate the order of executing migration operations on different data (i.e., the cold data and the hot data) according to the priority, preferentially execute scheduling operations on the cold data in a low workload period, and stop scheduling operations on the cold data and execute scheduling operations on the hot data after a first QoS threshold is exceeded. When the workload pattern is lower than the first QoS threshold, the scheduling operation for the hot data is stopped and the scheduling operation for the cold data is restarted, so that different read/write performances of the first memory group 301, the second memory group 302 and the third memory group 303 in the storage cluster 30 are reasonably utilized, the storage performance of the hierarchical storage is fully exerted, and the deployment cost of the hierarchical storage and a system or software including the hierarchical storage is reduced.

Preferably, in this embodiment, after the step S3 of generating the operation list, the method further includes: detecting the current workload of the first memory group 301 and the third memory group 303 forming the hierarchical storage, so as to schedule the data to be scheduled corresponding to the migration priority into the first memory group 301 and the third memory group 303 forming the hierarchical storage according to the order of the migration priority from high to low. Preferably, after the generating the operation list in step S3, the method further includes: the current workload of the single storage media respectively contained in the first storage group 301 and the third storage group 303 is detected, so as to schedule the data to be scheduled with the same migration priority to one or more storage media with lighter relative workload in the same storage group. By the technical scheme, the operation fineness of data with the same property (for example, a plurality of data to be scheduled, such as thermal data) in the executed scheduling process is further optimized, and the reasonable distribution of the data in the memory group of the same level is realized.

Step S3 further includes: the method comprises the steps of dividing data to be scheduled into cold data, warm data and hot data, preferentially executing the operation of scheduling the hot data to a first memory group 301, generating an optimized time period for scheduling the hot data after all the hot data are scheduled, binding the optimized time period with a migration priority, and executing migration operation on the warm data and the cold data within the excluded optimized time period of a preset period. Therefore, the response capability of the user to access the data with the highest frequency is at least met, and the response capability of the hierarchical storage device 20 which is deployed and runs the data scheduling method to the access request initiated by the user is further reduced, so that the time delay is further reduced.

The data scheduling method disclosed in this embodiment can avoid I/O traffic congestion of the hierarchical storage device by using the detected workload signal period instead of following a fixed signal period, and can learn and train the true workload period and workload pattern more accurately through the power spectral density function operation logic included in the power spectral density function detection module 10.

Meanwhile, in the embodiment, all the data to be migrated are planned according to the workload mode and the migration priority in the migration process. The workload cycle and workload pattern may be used as considerations for the prerequisite of performing data migration on the data to be migrated, and may be optimized and scheduled in advance before the occurrence of a periodic high load (i.e., the first intersection point formed by the reduced pattern and the first QoS threshold and located on the left in fig. 11) to satisfy the condition that the hot data can be saved in the first memory group 301 with high performance and low latency during the scheduled operation. In a data read/write scene containing periodicity or having an instant high load, the read/write requirements of the low-performance third memory group 303 deployed in the hierarchical storage device on cold data can be fully exerted, and the defect of high time delay caused by the fact that hot data cannot be used by the high-performance first memory group 301 to perform read/write operation when the high load really occurs can be avoided, so that the storage advantages of storage media with different read/write performances in the hierarchical storage device 20 on data to be scheduled are fully exerted, the data to be scheduled is better guided to be reasonably migrated to at least one storage medium in the hierarchically-stored memory groups with different read/write performances according to the hierarchically-stored performances, and the construction and maintenance costs of the hierarchical storage device are reduced.

Example two:

referring to fig. 16 and fig. 17, this embodiment shows a modified embodiment of the data scheduling method based on hierarchical storage according to the present invention.

Compared with the first embodiment, the main difference of this embodiment is that step S2 specifically includes the following steps: and step S21, pre-judging the data to be scheduled in the hierarchical storage processing. Step S22, an optimization policy configured with two QoS thresholds (i.e., a first QoS threshold and a second QoS threshold) is issued to the hierarchical storage, the workload cycle is divided into three cycle segments with different workloads by the two QoS thresholds, and a workload pattern dynamically set for the hierarchical storage within a preset cycle and matched with the cycle segments is output. The present embodiment is intended to disclose a detailed description of the scheduling process of cold data, warm data, and hot data based on hierarchical storage or memory groups having different read/write performance in the hierarchical storage device shown in fig. 10 (or fig. 13 or fig. 14).

An optimization policy containing a first QoS threshold and a second QoS threshold, which is higher than the first QoS threshold, is issued to the storage cluster 30 by the hierarchical storage controller 21 in fig. 10. Referring to fig. 16, the entire recognialized pattern is divided into a low duty cycle, a medium duty cycle, a high duty cycle, a medium duty cycle, and a low duty cycle from left to right. And scheduling operation on cold data is executed in a low working load period, scheduling operation on warm data is executed in a medium working load period, and scheduling operation on hot data is executed in a high working load period. The four critical points formed by the high, medium and low workload periods are the intersection points of the first and second QoS thresholds and the entire recognited pattern.

Referring to fig. 17, in the present embodiment, the operation list generated by the workload calculation module 25 includes a cold data list 241, a warm data list 242, and a hot data list 243. In this embodiment, after the step S3 generates the operation list, the method further includes: detecting the current workload of the first memory group 301, the second memory group 302 and the third memory group 303 which form the hierarchical storage, so as to schedule the data to be scheduled corresponding to the migration priority into the first memory group 301, the second memory group 302 and the third memory group 303 which form the hierarchical storage according to the sequence of the migration priority from high to low. The current workload of the single storage media respectively contained in the first storage group 301, the second storage group 302 and the third storage group 303 is detected, so that the data to be scheduled with the same migration priority are scheduled to one or more storage media with lighter relative workload in the same storage group.

The technical solutions of the present embodiment and the first embodiment having the same parts are described in the first embodiment, and are not described herein again.

Example three:

referring to fig. 13, according to the data scheduling method based on hierarchical storage disclosed in the first embodiment, a data scheduling system based on hierarchical storage (hereinafter referred to as "data scheduling system") is also disclosed in the present embodiment.

In this embodiment, a data scheduling system based on hierarchical storage includes: the system comprises a hierarchical storage controller 21, a storage cluster 30 mounted to the hierarchical storage controller, and a power spectral density function detection module 10. The tiered storage controller 21 deploys a workload computation module 25. The power spectral density function detection module 10 collects historical workload data stored hierarchically, and determines a workload cycle based on a power spectral density function corresponding to the historical workload data. The workload calculation module 25 dynamically sets a workload pattern matching the workload period for the hierarchical storage in a preset period, and sets a migration priority for the data to be scheduled in the preset period according to the workload pattern, so as to schedule the data to be scheduled to at least one storage medium in the memory groups with different read/write performances forming the hierarchical storage according to the migration priority.

The power spectral density function detection module 10 comprises: and a smoothing operation module 101 for performing smoothing processing on the hierarchically stored historical workload data. The fast fourier transform module 102 performs autocorrelation calculation on the smoothed historical workload data, and then obtains a power spectral density function corresponding to the historical workload data by using fast fourier transform calculation. The frequency point selection module 103 selects and calculates a least common multiple of a plurality of significant frequency points included in the power spectral density function to determine a workload period of the hierarchical storage for the historical workload data. The workload cycle adjusting module 104 performs pre-judgment on data to be scheduled in the hierarchical storage processing through a power spectral density function operation logic, issues an optimization strategy for configuring a plurality of QoS thresholds to the hierarchical storage, divides a workload cycle into at least two cycle segments with different workloads through the QoS thresholds, and outputs a workload pattern dynamically set for the hierarchical storage in a preset cycle and matched with the cycle segments to the workload pattern requesting module 23.

The hierarchical storage controller 21 includes: a workload calculation module 25, a collection module 22, a workload mode request module 23 and an optimization strategy forwarding module 24. The collection module 22 collects hierarchically stored historical workload data. The acquisition module 22 acquires historical workload data for each storage group in the storage cluster 30 along a path shown by a double-headed arrow 222 and sends the data to the power spectral density function detection module 10. The workload pattern request module 23 issues a request to the power spectral density function detection module 10 to determine a workload cycle based on the power spectral density function corresponding to the historical workload data, and receives a workload cycle generated by the power spectral density function operation logic included in the power spectral density function detection module 10. The optimization policy forwarding module 24 calls a workload pattern dynamically set for the hierarchical storage within a preset period to match the workload period from the workload calculation module 25 and forwards the workload pattern to the storage cluster 30. At the same time, workload calculation module 25 also generates an optimized time period for which thermal data is to be scheduled for operation and forwarded to storage cluster 30 by optimization policy forwarding module 24. Meanwhile, the operation of generating an optimized time period for which the hot data is to be scheduled for operation to bind the optimized time period with the migration priority may be performed by the optimization policy forwarding module 24.

The power spectral density function detection module 10 is disposed in a logic operation device 110 logically independent from the hierarchical storage controller 21, and the logic operation device 110 runs power spectral density function operation logic. Meanwhile, the logic operation device 110 is logically independent from the hierarchical storage device 20, and is connected with the logic operation device 110 through one or more communication channels by the collection module 22 and the workload mode request module 23. The workload calculation module 25 is connected to the storage cluster 30 via communication channels adapted to different storage media in the first storage group 301, the second storage group 302 or the third storage group 303.

The data scheduling method disclosed in the first embodiment and/or the second embodiment is executed by the data scheduling system based on hierarchical storage disclosed in this embodiment, and please refer to the description in the first embodiment and/or the second embodiment, which is not repeated herein.

Practice ofExample four:

referring to fig. 14, this embodiment is based on the technical solution of the data scheduling system disclosed in the third embodiment, and also provides a modification of the data scheduling system.

Compared with the three embodiments, especially the three embodiments, the main difference of the present embodiment is that in the present embodiment, the power spectral density function detection module 10 running the power spectral density function operation logic is disposed in the hierarchical storage device 20A as a whole, and the configuration of the logic operation device 110 shown in the third embodiment is omitted. The power spectral density function detection module 10 may establish an internal communication channel with the read/write controller 40 through a RESTful API interface to be controlled by the HOST 60. A user or administrator logs in HOST60 to create or modify computer code or programs that make up the power spectral density function arithmetic logic.

The present embodiment has the same technical solutions as those in the third embodiment, which are described in the third embodiment and will not be described herein again.

Example five:

referring to fig. 18, the embodiment further discloses an electronic device 500, which includes:

a processor 51, a memory device 52 consisting of at least one memory unit, and a communication bus 53 establishing a communication connection between the processor 51 and the memory device 52. The processor 51 is configured to execute one or more programs stored in the storage device 52 to implement a data scheduling method based on hierarchical storage as disclosed in the first embodiment and/or the second embodiment.

Specifically, the storage device 52 may be composed of a storage unit 521 and a storage unit 52j, where the parameter j is a positive integer greater than or equal to 1. The processor 51 may be an ASIC, FPGA, CPU, MCU or other physical hardware or virtual device with instruction processing functions. The form of the communication bus 53 is not particularly limited, I²The C bus, the SPI bus, the SCI bus, the PCI-E bus, the ISA bus, etc., and may be changed reasonably according to the specific type and application scenario requirements of the electronic device 500. The communication bus 53 is not the present applicationThis invention is not described in detail in this application.

The storage device 52 may be based on a distributed file system such as Ceph or GlusterFS, may also be a RAID 0-7 Disk array, and may also be configured as one or more hard disks or removable storage devices, a database server, an SSD (Solid-state Disk), an NAS storage system, or an SAN storage system. The electronic device 500 may be configured as a super-converged all-in-one machine, a computer, a server, a data center, a virtual cluster, a portable mobile terminal, a Web system, a financial payment platform or an ERP system, a virtual online payment platform/system, and the like; the ultra-convergence all-in-one machine is a high-performance multi-node server, mainly adopts a layered storage and server virtualization technology, highly integrates computing nodes, storage resources and network switching into a 1U, 2U or 4U server, and provides ultra-convergence infrastructure facilities for enterprises or terminal users so as to comprehensively improve the IT (information technology) capability of the enterprises. The storage device 52 can be regarded as a generic concept including the data scheduling system disclosed in the third embodiment and/or the fourth embodiment, or a hierarchical storage device 20(20A) based on the data scheduling system can be regarded as a part or all of the storage device 52.

In particular, an electronic device 500 disclosed in the present embodiment can schedule data according to a hierarchical storage-based data scheduling method disclosed in the first embodiment and/or the second embodiment, the method and the system reliably respond to one or more parallel tasks corresponding to access requests or operations initiated by users at clients (for example, to Virtual Machines (VMs) in the virtualization cluster 100 in a wired or wireless manner, especially in a scenario with severe requirements on real-time performance and security, such as online payment systems for shopping websites, settlement systems for financial institutions, electronic ticketing systems, etc., the electronic device 500 has an extremely important technical application value, a technical solution of the electronic device 500 disclosed in the present embodiment and technical solutions of the same parts in the first to fourth embodiments are shown in the first to fourth embodiments, and are not described herein again.

An electronic device 500 disclosed in this embodiment may be understood as a physical device (e.g., a POS machine, an atm), a software system (a financial system or an ERP system) or an internet online application (APP software) running a data scheduling method based on hierarchical storage as disclosed in the first and/or second embodiments, or even two or more computer systems/data centers that may be interconnected by an optical fiber or a network cable to form a direct connection topology, a tree topology, or a star topology.

In this embodiment, the integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a Processor (Processor) to execute all or part of the steps of the method according to the embodiments of the present invention.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A data scheduling method based on hierarchical storage is characterized by comprising the following steps:

2. The data scheduling method of claim 1, wherein the step S1 specifically includes the following steps:

3. The data scheduling method of claim 2, wherein the step S2 specifically includes the following steps:

s21, pre-judging data to be scheduled in layered storage processing;

4. The data scheduling method of claim 3, wherein in step S22, after outputting the workload pattern matched with the period segment dynamically set for the tiered storage in the preset period, the method further comprises: each cycle segment is associated with a memory group of different read/write performance that makes up the hierarchical storage.

5. The data scheduling method of claim 1, wherein after the step S3 of setting the migration priority of the data to be scheduled according to the workload mode in a preset period, the method further comprises: generating at least one operation list and visually displaying the operation list, wherein the operation list is associated with the migration priority of the data to be scheduled;

6. The data scheduling method of claim 5, wherein the memory group comprises: a first memory group having a high read/write performance, a second memory group having a medium read/write performance, and a third memory group having a low read/write performance;

7. The data scheduling method of claim 6, wherein after generating the operation list in step S3, the method further comprises: and detecting the current workloads of the first memory group, the second memory group and the third memory group which form the hierarchical storage, so as to schedule the data to be scheduled corresponding to the migration priority into the first memory group, the second memory group and the third memory group which form the hierarchical storage according to the sequence of the migration priority from high to low.

8. The data scheduling method of claim 6, wherein after generating the operation list in step S3, the method further comprises: and detecting the current workload of the single storage media respectively contained in the first storage group, the second storage group and the third storage group so as to schedule the data to be scheduled with the same migration priority to one or more storage media with lighter relative workload in the same storage group.

9. The data scheduling method according to claim 7 or 8, wherein the step S3 further comprises: the method comprises the steps of dividing data to be scheduled into cold data, warm data and hot data, preferentially executing the operation of scheduling the hot data to a first memory group, generating an optimized time period of the scheduling operation of the hot data after all the hot data are scheduled, binding the optimized time period with a migration priority, and excluding the migration operation of the warm data and the cold data in the optimized time period in a preset period.

10. A hierarchical storage based data scheduling system, comprising:

the hierarchical storage controller deploys a workload calculation module;

11. The data scheduling system of claim 10 wherein the hierarchical storage controller comprises: the system comprises a workload calculation module, an acquisition module, a workload mode request module and an optimization strategy forwarding module;

the work load mode request module initiates a request for executing a work load cycle determination based on a power spectral density function corresponding to historical work load data to the power spectral density function detection module, and receives the work load cycle generated by the power spectral density function operation logic contained in the power spectral density function detection module;

and the optimization strategy forwarding module calls a working load mode which is dynamically set for hierarchical storage in a preset period and is matched with the working load period from the working load calculation module and forwards the working load mode to the storage cluster.

12. The data scheduling system of claim 11 wherein the power spectral density function detection module comprises:

13. The data scheduling system of claim 11 wherein the power spectral density function detection module is deployed in a logical operation device logically independent of the hierarchical memory controller, the logical operation device running power spectral density function operation logic.

14. An electronic device, comprising:

processor, memory device comprising at least one memory unit, and

the processor is configured to execute one or more programs stored in the storage device to implement the data scheduling method based on hierarchical storage according to any one of claims 1 to 9.