CN117032986A - Storage system flow control method and system - Google Patents
Storage system flow control method and system Download PDFInfo
- Publication number
- CN117032986A CN117032986A CN202311090387.XA CN202311090387A CN117032986A CN 117032986 A CN117032986 A CN 117032986A CN 202311090387 A CN202311090387 A CN 202311090387A CN 117032986 A CN117032986 A CN 117032986A
- Authority
- CN
- China
- Prior art keywords
- time period
- storage system
- flow control
- end service
- current time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of storage, and particularly relates to a storage system flow control method and system. The method monitors key index data of a current time period of a bottom layer of the storage system to determine a change trend of the current time period of a bottom layer load of the storage system and analyze whether the bottom layer load of the storage system is overloaded; if overload occurs, analyzing the change trend of the current time period and the future time period of the front-end service performance according to the monitored key index data of the current time period of the front-end service; if the change trend of the current time period of the front-end service performance is reduced and exceeds a corresponding threshold value, the similarity between the key index data of the current time period of the bottom layer of the storage system and the key index data of the current time period of the front-end service is analyzed, and if the similarity is higher or the change trend of the future time period of the front-end service performance is reduced, the flow control of the front-end service is started. The invention ensures that the overload phenomenon of the bottom layer of the storage system is effectively relieved, and the phenomenon of excessive flow control on front-end business is avoided.
Description
Technical Field
The invention belongs to the technical field of storage, and particularly relates to a storage system flow control method and system.
Background
In the field of storage technology, flow control is a very important technology, which is a key technical point for ensuring that a storage system can still provide a relatively stable quality of service in the event of overload. Most flow control schemes are developed around the following two aspects:
1. the characteristics of the service type are accurately identified, and various services are differentially processed. The method comprises the following steps: and classifying the services in the storage system, giving different priorities to the different types of services, and controlling the priorities to ensure the service quality of the front-end services. For example, both the background deletion service and the periodic data verification service are set to low priority, the data reconstruction service is set to medium priority, and the front-end read-write service is set to high priority.
2. And distributing the system co-processing capacity to different services through a flow control algorithm. The method comprises the following steps: the processing capacity of the storage system is converted into tokens, for example, the number of the tokens which can be issued by a single disk is converted according to different storage medium types, the total number of the tokens which can be issued is further calculated according to the total number of the storage media in the system, and then, different numbers of tokens are issued for the services with different priorities according to a certain algorithm.
There are some improvements to the above two technologies, including:
1. further subdivisions and extensions are made to the traffic type. For example, read-write type services are managed separately, internal metadata requests of a subsystem are set to be higher priority, and hotspot migration is managed independently. In summary, the finer the traffic type is, the finer the flow control.
2. For the flow control algorithm, other resources such as CPU, memory, network are factors that convert the processing power in addition to the processing power of the storage medium.
3. In order to make the algorithm more urgent and flexible to cope with different business scenes, various factors in the algorithm are presented in a configuration item mode, and a user is allowed to conduct fine adjustment.
However, these above fluidic techniques still suffer from certain drawbacks, including:
1. when the flow control is triggered, front-end service is damaged. In some scenarios, such as when server failure results in data reconstruction, if the upper layer has no service or very small service pressure, even if more bandwidth is allocated to the data reconstruction service (more tokens are issued), the priority of the data reconstruction cannot be dynamically adjusted, because the service priorities are basically statically set, whether to perform flow control or not is completely determined according to the utilization rate of the underlying resources. This situation may be referred to as "superfluidity".
2. The flow control object is unreasonable. Different volumes, buckets, directories, whether front-end traffic is blocks, objects, or files, carry different traffic, respectively, consuming different IOPS or bandwidths. The scheme in the prior art is that IOPS or bandwidth which needs to be cut down by the bottom layer cannot be reasonably distributed to a front-end volume, barrel or catalog when the flow control is triggered.
3. The load of the storage system is reduced in a blind manner by limiting the front-end service pressure, and in some cases, the front-end service pressure is limited enough, but the overload condition of the load of the storage system is not relieved to a corresponding extent, so that the overload of the storage system can not be effectively relieved while the front-end service is excessively controlled.
Disclosure of Invention
The invention aims to provide a method and a system for controlling flow of a storage system, which are used for solving the problem that overload of the storage system cannot be effectively relieved when front-end business is excessively controlled by adopting the method in the prior art.
In order to solve the technical problems, the invention provides a storage system flow control method, which comprises the following steps:
1) Monitoring key index data of a current time period of a bottom layer of the storage system to determine a change trend of the current time period of a bottom layer load of the storage system and analyze whether the bottom layer load of the storage system is overloaded; if overload occurs, analyzing the change trend of the current time period and the future time period of the front-end service performance according to the monitored key index data of the current time period of the front-end service;
2) If the change trend of the current time period of the front-end service performance is reduced and exceeds a preset change trend threshold, analyzing the similarity between the key index data of the current time period of the bottom layer of the storage system and the key index data of the current time period of the front-end service, and if the similarity exceeds a preset similarity threshold or the change trend of the future time period of the front-end service performance is reduced, starting the flow control of the front-end service.
The beneficial effects of the technical scheme are as follows: under the condition that overload occurs on the bottom layer of the storage system and the performance of the front-end service is reduced by a certain amplitude, the similarity between the key index data of the bottom layer of the storage system and the key index data of the front-end service is analyzed to judge whether the overload of the bottom layer of the storage system and the performance reduction of the front-end service are associated or not, and if the overload is associated strongly, the front-end service is indicated to be effectively controlled at the moment, or the change trend of the future time period of the performance of the front-end service is found to be reduced, the flow control is carried out under the condition that the overload phenomenon of the bottom layer of the storage system is effectively relieved, the phenomenon of excessive flow control of the front-end service is avoided, and the full utilization of all resources of the system is ensured.
Further, the key indexes of the bottom layer of the storage system in the step 1) comprise key indexes of various key resources and various business modules; the various key resources comprise at least two resources of storage resources, network resources and CPU and memory resources; the key indexes corresponding to the storage resources comprise at least one key index of the utilization rate of the disk, the read-write time delay of the request, the read-write IOPS and the read-write bandwidth; the key index corresponding to the network resource comprises at least one key index of the read-write bandwidth of the network card and the number of the receiving and transmitting packets of the network card; the key indexes corresponding to the CPU and the memory resources comprise the utilization rate of the CPU and the memory; the various business modules comprise at least one of a business module for processing data consistency and data redundancy, a cache module for accelerating performance and a business module for interacting with a storage medium to complete data reading and writing; the key indexes corresponding to the cache module comprise at least one key index of the current water level and the historical water level of the cache pool, the dirty data brushing speed in the cache pool, the utilization rate of the storage medium in the cache pool, the read-write bandwidth of the storage medium in the cache pool and the read-write IOPS.
The beneficial effects of the technical scheme are as follows: and judging and comprehensively evaluating the load of the bottom layer of the storage system according to the key indexes of various key resources and various business modules.
Further, the means for analyzing the trend of the future time period of the front-end service performance in the step 1) is as follows: fitting the time sequence data of the key index of the current time period of the front-end service according to the ARIMA differential autoregressive moving average model to predict the change trend of the key index of the future time period, namely the change trend of the future time period of the front-end service performance.
The beneficial effects of the technical scheme are as follows: the ARIMA differential autoregressive moving average model can accurately analyze the change trend of data.
Further, if the key index of the bottom layer of the storage system includes a plurality of indexes, the means for analyzing the similarity between the key index data of the current time period of the bottom layer of the storage system and the key index data of the current time period of the front-end service is as follows: firstly, carrying out normalization processing on each key index data of the current time period of the bottom layer of the storage system; then, for a certain moment, carrying out averaging treatment on all key index data subjected to normalization treatment on the bottom layer of the storage system at the moment to obtain the load of the bottom layer of the storage system at the moment, thereby obtaining the loads of the bottom layer of the storage system at all moments in the current time period; then, carrying out normalization processing on key index data of the front-end service in the current time period; and further, carrying out correlation analysis on the load of each moment in the current time period of the storage bottom layer and the key index data of the current time period of the front-end service after normalization processing by adopting a correlation analysis algorithm.
Further, the correlation analysis algorithm in the step 2) is a pearson correlation coefficient method.
Further, if the change trend of the front-end service performance in the current time period is not satisfied and is reduced and exceeds a preset change trend threshold, and the background service pressure is satisfied and is smaller than a preset pressure threshold, the flow control of the background service is released.
Further, the means for flow control of the front-end service in step 2) is as follows: screening out an object to be subjected to flow control from front-end service, and adding the object into a flow control object list of QoS to be set if the object is not set in a white list and the object is not set with a static QoS value; wherein, the object in the white list can not be controlled, the object with the static QoS value indicates that the QoS value after being controlled can not be lower than the static QoS value; and performing flow control on each object in the flow control object list.
The beneficial effects of the technical scheme are as follows: and a control strategy of a black-and-white list and a static control strategy of front-end service are set to meet different requirements, and evaluation is carried out from multiple dimensions.
Further, the means for performing flow control on each object in the flow control object list is to perform flow control according to a set weight mode or an equalization mode; the weight mode is as follows: setting performance priorities of all objects in the flow control object list, wherein different priorities represent different degrees of flow control, and performing flow control on all objects according to the set priorities; the equalization mode is as follows: the total required flow control amount is evenly distributed to each object in the flow control object list.
The beneficial effects of the technical scheme are as follows: the weight mode or the equalization mode is selected according to actual requirements, so that the method is flexible and changeable; wherein, setting different priorities can lead the object to be distributed to different resources, thereby effectively reducing pressure.
In order to solve the technical problems, the invention also provides a storage system flow control system, which comprises a front-end business key index monitoring module, a storage system bottom layer key resource monitoring module, a storage system flow control algorithm module and a front-end business flow control algorithm module;
the front-end business key index monitoring module is used for monitoring key index data of a current time period of a bottom layer of the storage system so as to determine a change trend of the current time period of the bottom layer load of the storage system and analyze whether the bottom layer load of the storage system is overloaded;
the storage system bottom layer key resource monitoring module is used for monitoring key index data of the front-end service current time period, and analyzing the change trend of the front-end service performance current time period and the future time period according to the monitored key index data of the front-end service current time period when the load of the storage system bottom layer is overloaded;
the storage system flow control algorithm module is used for analyzing the similarity between the key index data of the front-end service current time period and the key index data of the front-end service current time period when the change trend of the front-end service performance current time period is judged to be declining and exceeds a preset change trend threshold, and issuing a flow control starting instruction to the front-end service flow control algorithm module if the similarity exceeds a preset similarity threshold or the change trend of the front-end service performance future time period is declining;
the front-end service flow control algorithm module is used for performing flow control on the front-end service.
The beneficial effects of the technical scheme are as follows: the system is provided with a front-end business key index monitoring module, a storage system bottom layer key resource monitoring module, a storage system flow control algorithm module and a front-end business flow control algorithm module, and the modules cooperate to realize the storage system flow control method.
Drawings
FIG. 1 is a block diagram of a memory system flow control system of the present invention;
FIG. 2 is a flow chart of the dynamic flow control algorithm of the present invention;
fig. 3 is a flow chart of the front-end traffic flow control algorithm of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Storage system flow control system embodiment:
the structure of the storage system flow control system of the embodiment is shown in fig. 1, and the storage system flow control system comprises a storage system bottom layer key resource monitoring module, a front-end service key index monitoring module, a storage system flow control algorithm module and a front-end service flow control algorithm module. The functions of the modules, the connection relation among the modules and the processing logic are as follows:
1. and the bottom layer key resource monitoring module of the storage system. The module is used for monitoring the loads of various key resources (including storage resources, CPU and memory resources) at the bottom layer of the storage system in real time, and of course, the loads of all business modules are monitored in real time besides hardware resources. Specifically:
a) And a storage resource monitoring module. The storage resource refers to a physical hard disk for main storage or acceleration cache, and the physical hard disk can be an HDD mechanical disk or an SSD solid state disk. The main monitoring indexes comprise disk utilization rate, requested read-write time delay, read-write IOPS and read-write bandwidth, and the load of the physical hard disk can reflect the load of the whole storage system most, so that the load of the storage resource occupies the highest weight in the flow control algorithm.
b) And a network resource monitoring module. The network resource refers to a network card device on the server. The main monitoring indexes comprise the read-write bandwidth of the network card and the number of the receiving and transmitting packets of the network card.
c) CPU and memory resource monitoring module. The CPU and memory resources refer to the CPU and memory on the storage server. The main monitoring index comprises the utilization rate of the CPU and the memory.
d) And each business module is a key index monitoring module. The service modules refer to key service modules in a storage system, such as a service module responsible for processing data consistency and data redundancy, a cache module responsible for performance acceleration and a service module responsible for interacting with a storage medium to complete data reading and writing. The main monitoring index is related to the functions born by the service module, for example, the key indexes of the cache module comprise the current water level and the historical water level of the cache pool, the dirty data disk brushing speed in the cache pool, the utilization rate of the storage medium in the cache pool, the read-write bandwidth and the read-write IOPS of the storage medium in the cache pool; the other module is mainly the length of the queue that the module is responsible for receiving requests.
2. And the front-end business key index monitoring module. Front-end services refer to interfaces or protocol modules provided by a storage system to the outside, and generally comprise three types of blocks, objects and files. The main function of the module is to analyze the current damage condition of the service in real time according to the recent data by monitoring key service indexes, predict the future trend according to longer historical data, and accurately grasp the flow control time only by accurately analyzing the damage condition of the service. Regardless of the interface, traffic metrics can be abstracted into three important metrics, IOPS, bandwidth, and latency. Specifically:
a) And the front-end service IOPS monitoring module. The module monitors the IOPS of the front-end service, the data sampling period and the storage period of the historical data in real time, and can be dynamically adjusted through the algorithm parameter configuration module.
b) And the front-end service bandwidth monitoring module. The module monitors the time delay, the data sampling period and the storage period of the historical data of the front-end service in real time, and can be dynamically adjusted through the algorithm parameter configuration module.
c) And the front-end service time delay monitoring module. The module monitors the time delay, the data sampling period and the storage period of the historical data of the front-end service in real time, and can be dynamically adjusted through the algorithm parameter configuration module.
d) And the front-end business trend prediction module. The function of the module is to determine the performance trend of the front-end service in the current end time and predict the performance trend of the front-end service in the future time according to the historical data in the longer period of the three indexes (IOPS, bandwidth and time delay) of the front-end service, wherein the trend is divided into six types, namely falling, rising, leveling, periodic fluctuation, fluctuation rising and falling in fluctuation. The module adopts a trend prediction algorithm based on time sequence analysis, a main body adopts an ARIMA differential autoregressive moving average model to analyze a certain monitoring index, the monitoring index is decomposed into a trend part, a period part and a residual sequence, and the analysis result of the trend part is utilized to judge whether the flow control is necessary. The periodic part in the algorithm refers to the periodic variation law of the monitoring index, wherein the periodicity can be in units of minutes, hours or days, or can be in units of weeks or months, for example, 8 hours a day before the peak period of work, the system load reaches a peak value, and 6 hours a day after the peak period of work, the system load reaches the minimum. The trend part in the algorithm refers to the trend that the rising and falling of the load fluctuate up and down within a certain range. The residual sequence in the algorithm is the original sequence of the training data minus the fitting sequence on the training data. The more consistent the sequence is with a random error distribution (normal distribution with a mean of 0), the better the model fit is explained. Taking IOPS as an example, the input items are a set of time series data formed by { time 1, IOPS1}, { time 2, IOPS2}, { time 3, IOPS3} …, and the output items after prediction are the trends of the IOPS, namely { rising, confidence } or { falling, confidence } or { holding, confidence }.
3. And a flow control algorithm module of the storage system. The module is mainly used for calculating whether the flow control needs to be triggered in real time according to monitoring data obtained from a bottom layer key resource monitoring module and a front-end service key index monitoring module of the storage system. The module mainly comprises a token bucket algorithm module, a dynamic flow control algorithm based on front-end service, an index similarity analysis module and an algorithm parameter configuration module.
a) A token bucket algorithm module. The module is a basic module of a flow control algorithm, the algorithm generates tokens at a constant speed and puts the tokens into a virtual bucket, when a front-end service request is issued to an inlet of a storage engine, the storage engine firstly acquires the tokens from the token bucket, if the tokens can be acquired, the request is processed, and if the tokens cannot be acquired, the request is refused. Likewise, other traffic modules managed by the flow control algorithm need to acquire tokens when processing a request. The token algorithm is calculated based on the load of the underlying key resources.
b) Dynamic flow control algorithm based on front-end service. The dynamic flow control refers to that when the time for triggering the flow control mainly refers to the load of the front-end service, the flow control for the front-end service is triggered when the performance of the front-end service is damaged beyond a set threshold. The priority of each service module can be dynamically adjusted under the condition that the front-end service performance is not affected. For example, when the service pressure of the front end is very small or no hard disk fault occurs in the system, and data reconstruction is needed, the priority of the reconstructed service can be dynamically adjusted to be high, the network bandwidth and the read-write bandwidth of the hard disk are fully utilized, the data reconstruction is completed as soon as possible, and the reliability of the system is improved. Under the condition that the service pressure of the front end is large and the performance is reduced, the influence on the front end service is reduced as much as possible while the reconstructed service priority is kept in static setting. Dynamic flow control has another dimension meaning, for example, in a block storage scenario, each volume only needs to be set with priority, and QoS (quality of service) values do not need to be set statically, when storage system pressure continuously rises and cannot be controlled in a short time, qoS of each volume can be calculated dynamically according to priority, and system pressure is reduced by setting QoS to the volume. And under what conditions, the flow control of the front-end service needs to be triggered is decided according to the index association analysis module. The specific flow in this embodiment is shown in fig. 2, specifically:
(1) acquiring key index data of a bottom layer of the storage system, analyzing a change trend of a load of the bottom layer of the storage system and determining whether the load of the bottom layer of the storage system is overloaded; if overload occurs, analyzing the change trend of the front-end service performance in the current time period and the future time period according to the monitored key index data of the front-end service, executing the step (2), and if not, executing the step (1) again;
(2) analyzing the change trend of the current time period of the front-end service performance: if the change trend of the front-end service performance current time period is reduced and exceeds a preset change trend threshold, analyzing the similarity between the key index data of the storage system bottom layer current time period and the key index data of the front-end service current time period, and executing the step (3); otherwise, judging whether the background service pressure such as reconstruction exists or not: if the background service pressure exists and the front end has no service or the front end service pressure is very small, releasing the flow control of the background service; if the background service pressure exists and the front-end service pressure is relatively large, performing flow control according to the default priority; and if the background service pressure does not exist, performing flow control according to the default priority.
(3) If the similarity exceeds a preset similarity threshold or the change trend of the future time period of the front-end service performance is declined (the condition is not shown in fig. 2), calculating a flow control value of the front-end service, and sending a decision result of the flow control of the front-end service to be started and a specific flow control value to a front-end service flow control algorithm module; otherwise, re-executing the step (1).
c) And the index similarity analysis module. The main function of the module is to analyze the similarity between the variation trend of the load of the storage bottom layer system and the variation trend of the performance of the front-end service, and the whole thought is to decide whether the load of the bottom layer storage system is increased or not to influence the performance reduction of the front-end service based on the similarity. The threshold value of whether the performance of the front-end service is reduced is configured by the system when the system is initialized, and the threshold value of the similarity is also supported to be set by the algorithm parameter configuration module. When similarity is analyzed, a classical pearson correlation coefficient algorithm is adopted, and the pearson correlation coefficient is widely used for measuring the correlation degree between two variables, wherein the value of the pearson correlation coefficient is between-1 and 1,0 represents no correlation, a negative value represents negative correlation, and a positive value represents positive correlation. Assuming that the threshold value of the overload of the storage system is 85%, abstracting each sampling point in the sampling period to be 0 and 1, wherein 0 represents abnormality, 1 represents normal, and a group of sampling data is formed (1,1,1,1,1,1,0,0,0,0); if the front-end service is judged to be the descending threshold value of 30%, each sampling point in the sampling period is abstracted into 0 and 1, wherein 0 represents abnormality, namely the descending amplitude exceeds 30%,1 represents normal, namely the descending amplitude is within 30%, a group of sampling data is formed as (1,1,1,1,1,0,0,0,0,0), the correlation coefficient of the two groups of indexes is 0.8165, which indicates that the performance decline of the front-end service has strong correlation with overload of a bottom storage system, and the flow control of the front-end service needs to be started.
During specific processing, because the key indexes of the bottom layer of the storage system are numerous, each index is normalized during specific processing, and the average value of each index is obtained after normalization processing and is used as the load of the bottom layer of the storage system; the key indexes (only one is IOPS or bandwidth) of the front-end service are subjected to normalization treatment; and further, a correlation analysis algorithm (specifically, a pearson correlation coefficient algorithm introduced in the previous paragraph can be selected) is adopted to perform correlation analysis on the normalized and average load of the storage bottom layer and the key index data of the front-end service after normalization treatment, so as to obtain a conclusion whether the flow control of the front-end service needs to be started. In addition, in the normalization process, although the index is different (the IOPS for the front-end service or the bandwidth, and the utilization rate of a certain resource for the bottom layer) regardless of whether the normalization process is the front-end service normalization process or the bottom layer storage load normalization process, the normalization process is performed in accordance with whether the rising or downloading amplitude is within a predefined range, and if the rising or downloading amplitude is within the predefined range, the description change is not large, and if not, the normalization process is 0, otherwise, the normalization process is 1.
d) And an algorithm parameter configuration module. The module is used for conveniently adjusting the sensitivity of the algorithm, and allows system maintenance personnel to set different parameters according to different accessory configurations and sensitivity of front-end service to performance damage conditions. Parameters supporting configuration include: (1) parameters of the token bucket algorithm comprise a time interval for issuing tokens, timeout time for acquiring tokens and maximum number of tokens acquired each time; (2) parameters of each monitoring module comprise a sampling period of a monitoring index, a storage period of historical data and a threshold value of which the front-end service performance is judged to be damaged; (3) parameters of the dynamic flow control algorithm comprise a load threshold of a storage system for triggering and reducing front-end service pressure, a sampling period for monitoring bottom storage load, configuration items of all plug-ins in a QoS strategy selection module, priority of a front-end service object, a black-and-white list of the service object, qoS values of all areas of the front-end service object and priority of all service modules; (4) the parameters of the index association analysis module comprise association thresholds of two groups of indexes and the sampled data quantity of an association analysis algorithm.
4. Front-end traffic flow control algorithm. When the storage system flow control algorithm module recognizes that QoS needs to be automatically set for the front-end service, the QoS policy selection module is responsible for automatically screening which objects (volumes, buckets, or directories) need to be limited to IOPS or bandwidth, and how many each object needs to be set, respectively. The module contains a plurality of selection strategies, and evaluates from a plurality of dimensions to ensure rationality. The module mainly comprises the following plug-ins:
a) Control strategy based on black and white list: this strategy allows the user to set a white list where the object indicates that the object corresponds to a very high priority of service, even in the case of high storage pressures, without any loss of performance being expected.
b) Static control strategy of front-end service: the policy allows the user to set a static QoS to the object, and once the static QoS is set, the QoS value is processed with a higher priority than the dynamically calculated QoS value. For example, the IOPS of the static QoS of a certain object is 1000, and the IOPS value after clipping calculated according to the volume traffic priority and clipping policy is 800, the IOPS should be set to 1000, i.e., the IOPS or bandwidth after clipping cannot be lower than the static QoS value.
c) Control strategy based on service periodicity characteristics: this strategy allows statistics and preservation of periodic features of the front-end traffic, including IOPS and bandwidth variations of the traffic over the day, week, month. The characteristic in one day calculates the change trend of each hour according to the sampling period of one minute, wherein the trend comprises rising, falling, leveling, periodic fluctuation, fluctuation rising and falling in fluctuation. In addition to analyzing the trends of the business indicators, the peak and valley periods of business within a day are also analyzed. And outputting results of analysis in other periods and outputting results in the same day. Based on the periodic characteristics of the object, the Qos policy selection algorithm may delay setting Qos or even not setting Qos with reference to the historical trend of the target object.
d) And (3) a clipping strategy: the policy user allows setting of both clipping modes, weight mode or equalization mode. The weight pattern indicates that the system will calculate the respective clipped IOPS and bandwidth according to the traffic priority of each object, and the equalization pattern indicates that the clipped IOPS and bandwidth will be evenly spread over each object.
The flow of the front-end service flow control algorithm formed by combining the plug-ins is shown in fig. 3, and the flow is specifically as follows:
(1) when the storage system flow control algorithm module identifies that QoS needs to be automatically set for front-end service, the QoS strategy selection module is responsible for automatically screening which objects (volumes, barrels or catalogs) need to be limited in IOPS or bandwidth, and judging whether the objects are in a white list or not: qoS is not set (i.e., is not flow controlled) for objects set in the whitelist; otherwise, executing the step (2) to judge whether the static QoS value is set.
(2) QoS is not set as such for an object that has static QoS set; otherwise, executing the step (3).
(3) Adding the objects which are not in the white list and are not provided with the static QoS into a flow control object list of QoS to be provided, carrying out performance priority setting on each object in the flow control object list, wherein different priorities represent different degrees of flow control, and executing the step (4) to judge the used reduction strategy.
(4) Judging whether the clipping strategy is an equalization mode or a weight mode: if the mode is the balanced mode, calculating the IOPS or bandwidth required to be reduced by each object in an average allocation mode, and executing the step (5); if the mode is the weight mode, the reduced IOPS and bandwidth are calculated according to the performance priority of each object, monitoring indexes such as time delay of the object are obtained to finely adjust the QoS value, and then the step (5) is executed. Wherein the fine tuning means comprises: the larger the time delay, the smaller the proportion of the reduction, the more than 3% the magnitude of the adjustment is controlled by the percentage, and the denominator when calculating the percentage is the QoS value calculated according to the weight mode.
(5) Judging whether periodic characteristic data exist or not: if not, the QoS is immediately limited; if the pressure is in the rising trend, the QoS is also limited immediately; if the pressure is in the descending trend, the step (3) is re-executed. The periodic characteristic data refers to the accumulated service characteristics through long-term monitoring, for example, a certain period of monday belongs to a service peak period, but only lasts for one or two hours, and the pressure is reduced after the two hours, in this case, the system is automatically recovered even if the flow control is not performed.
In general, the method has the following features: 1) By dynamically sensing the performance change trend of the front-end service, the priority of other services is dynamically adjusted under the condition that the front-end service is not damaged, so that the resources of the system can be fully utilized, and excessive flow control is effectively prevented; 2) When the load of the storage system is required to be reduced by limiting the pressure of the front-end service, a front-end service flow control algorithm is used, the pressure is strategically reduced, and the objects with different priorities are ensured to be distributed to different resources.
Flow control method embodiment of storage system:
the whole thought of the method comprises the steps of analyzing the similarity between key index data of the bottom layer of the storage system and key index data of the front-end service when the load of the bottom layer of the storage system is overloaded and the change trend of the front-end service performance is judged to be reduced to a certain extent, starting the flow control of the front-end service when the performance reduction of the front-end service has strong correlation with the overload of the bottom layer storage system, and ensuring that the flow control of the front-end service is effective. Moreover, when the flow control is specifically performed, a plurality of different control strategies are combined, including a control strategy based on a black-and-white list, a control strategy based on service priority, a static control strategy of front-end service and the like. The control logic for starting the flow control of the front-end service and the specific control logic flow chart of how to implement the flow control are shown in fig. 2 and fig. 3, and the specific implementation process is also described in detail in the embodiment of the storage system flow control system, which is not described in detail.
Claims (9)
1. A method for flow control in a storage system, comprising the steps of:
1) Monitoring key index data of a current time period of a bottom layer of the storage system to determine a change trend of the current time period of a bottom layer load of the storage system and analyze whether the bottom layer load of the storage system is overloaded; if overload occurs, analyzing the change trend of the current time period and the future time period of the front-end service performance according to the monitored key index data of the current time period of the front-end service;
2) If the change trend of the current time period of the front-end service performance is reduced and exceeds a preset change trend threshold, analyzing the similarity between the key index data of the current time period of the bottom layer of the storage system and the key index data of the current time period of the front-end service, and if the similarity exceeds a preset similarity threshold or the change trend of the future time period of the front-end service performance is reduced, starting the flow control of the front-end service.
2. The method according to claim 1, wherein the key indexes of the bottom layer of the storage system in step 1) include key indexes of each key resource and each business module;
the various key resources comprise at least two resources of storage resources, network resources and CPU and memory resources; the key indexes corresponding to the storage resources comprise at least one key index of the utilization rate of the disk, the read-write time delay of the request, the read-write IOPS and the read-write bandwidth; the key index corresponding to the network resource comprises at least one key index of the read-write bandwidth of the network card and the number of the receiving and transmitting packets of the network card; the key indexes corresponding to the CPU and the memory resources comprise the utilization rate of the CPU and the memory;
the various business modules comprise at least one of a business module for processing data consistency and data redundancy, a cache module for accelerating performance and a business module for interacting with a storage medium to complete data reading and writing; the key indexes corresponding to the cache module comprise at least one key index of the current water level and the historical water level of the cache pool, the dirty data brushing speed in the cache pool, the utilization rate of the storage medium in the cache pool, the read-write bandwidth of the storage medium in the cache pool and the read-write IOPS.
3. The method for flow control of a storage system according to claim 1, wherein the means for analyzing the trend of the future time period of the front-end service performance in step 1) is as follows: fitting the time sequence data of the key index of the current time period of the front-end service according to the ARIMA differential autoregressive moving average model to predict the change trend of the key index of the future time period, namely the change trend of the future time period of the front-end service performance.
4. The method for flow control of a storage system according to claim 1, wherein if the key index of the bottom layer of the storage system includes a plurality of indexes, means for analyzing the similarity between the key index data of the current time period of the bottom layer of the storage system and the key index data of the current time period of the front-end service is as follows: firstly, carrying out normalization processing on each key index data of the current time period of the bottom layer of the storage system; then, for a certain moment, carrying out averaging treatment on all key index data subjected to normalization treatment on the bottom layer of the storage system at the moment to obtain the load of the bottom layer of the storage system at the moment, thereby obtaining the loads of the bottom layer of the storage system at all moments in the current time period; then, carrying out normalization processing on key index data of the front-end service in the current time period; and further, carrying out correlation analysis on the load of each moment in the current time period of the storage bottom layer and the key index data of the current time period of the front-end service after normalization processing by adopting a correlation analysis algorithm.
5. The method according to claim 4, wherein the correlation analysis algorithm in step 2) is pearson correlation coefficient method.
6. The method according to claim 1, wherein the flow control of the background service is released if the change trend of the current time period of the front-end service performance is not satisfied and exceeds a preset change trend threshold, and the background service pressure is satisfied and the front-end service pressure is smaller than a preset pressure threshold.
7. The method for controlling flow of a storage system according to claim 1, wherein the means for controlling flow of front-end services in step 2) is:
screening out an object to be subjected to flow control from front-end service, and adding the object into a flow control object list of QoS to be set if the object is not set in a white list and the object is not set with a static QoS value; wherein, the object in the white list can not be controlled, the object with the static QoS value indicates that the QoS value after being controlled can not be lower than the static QoS value;
and performing flow control on each object in the flow control object list.
8. The method for controlling flow of storage system according to claim 7, wherein the means for controlling flow of each object in the flow control object list is to control flow according to a set weight mode or an equalization mode;
the weight mode is as follows: setting performance priorities of all objects in the flow control object list, wherein different priorities represent different degrees of flow control, and performing flow control on all objects according to the set priorities;
the equalization mode is as follows: the total required flow control amount is evenly distributed to each object in the flow control object list.
9. The storage system flow control system is characterized by comprising a front-end business key index monitoring module, a storage system bottom layer key resource monitoring module, a storage system flow control algorithm module and a front-end business flow control algorithm module;
the front-end business key index monitoring module is used for monitoring key index data of a current time period of a bottom layer of the storage system so as to determine a change trend of the current time period of the bottom layer load of the storage system and analyze whether the bottom layer load of the storage system is overloaded;
the storage system bottom layer key resource monitoring module is used for monitoring key index data of the front-end service current time period, and analyzing the change trend of the front-end service performance current time period and the future time period according to the monitored key index data of the front-end service current time period when the load of the storage system bottom layer is overloaded;
the storage system flow control algorithm module is used for analyzing the similarity between the key index data of the front-end service current time period and the key index data of the front-end service current time period when the change trend of the front-end service performance current time period is judged to be declining and exceeds a preset change trend threshold, and issuing a flow control starting instruction to the front-end service flow control algorithm module if the similarity exceeds a preset similarity threshold or the change trend of the front-end service performance future time period is declining;
the front-end service flow control algorithm module is used for performing flow control on the front-end service.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311090387.XA CN117032986A (en) | 2023-08-28 | 2023-08-28 | Storage system flow control method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311090387.XA CN117032986A (en) | 2023-08-28 | 2023-08-28 | Storage system flow control method and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117032986A true CN117032986A (en) | 2023-11-10 |
Family
ID=88641095
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311090387.XA Pending CN117032986A (en) | 2023-08-28 | 2023-08-28 | Storage system flow control method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117032986A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118509327A (en) * | 2024-07-22 | 2024-08-16 | 海马云(天津)信息技术有限公司 | Download bandwidth adjustment method and device, electronic device and storage medium |
-
2023
- 2023-08-28 CN CN202311090387.XA patent/CN117032986A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118509327A (en) * | 2024-07-22 | 2024-08-16 | 海马云(天津)信息技术有限公司 | Download bandwidth adjustment method and device, electronic device and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20050154576A1 (en) | Policy simulator for analyzing autonomic system management policy of a computer system | |
| CN110289994B (en) | Cluster capacity adjusting method and device | |
| US20120221730A1 (en) | Resource control system and resource control method | |
| CN119739498A (en) | Intelligent resource scheduling and dynamic data priority management method and system | |
| CN120029762A (en) | Adaptive load balancing method and system based on server status analysis | |
| WO2025044606A1 (en) | Flow control method of distributed storage system, and distributed storage system | |
| CN118245234B (en) | Distributed load balancing method and system based on cloud computing | |
| CN112165436A (en) | Flow control method, device and system | |
| CN110543355A (en) | method for automatically balancing cloud platform resources | |
| CN117880291A (en) | Multi-cloud resource load balancing method, device, equipment and medium based on GPT technology | |
| CN116467082A (en) | A resource allocation method and system based on big data | |
| CN117032986A (en) | Storage system flow control method and system | |
| CN114115702A (en) | Storage control method, device, storage system and storage medium | |
| CN114020218B (en) | Hybrid de-duplication scheduling method and system | |
| CN118838759B (en) | A computing power data operation and maintenance processing method and system based on AI intelligence | |
| CN118567911B (en) | Real-time data backup and recovery method for solid state disk | |
| CN104899072A (en) | Fine-grained resource dispatching system and fine-grained resource dispatching method based on virtualization platform | |
| CN119645622A (en) | Dynamic load balancing system and method for multi-cluster intelligent expansion | |
| CN117097646A (en) | Tail delay adjustment method and device | |
| CN110727518A (en) | Data processing method and related equipment | |
| CN120631275B (en) | Distributed storage management system for real estate data | |
| CN116303326B (en) | A distributed file system directory quota optimization method, system, device and medium | |
| CN119917549B (en) | Health big data platform architecture optimization method and system based on cloud computing | |
| CN113626140A (en) | Virtual machine adjusting method and related device | |
| CN119697123B (en) | Distributed system hierarchical speed limiting method and device based on real-time load sensing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |