CN113722072A - Storage system file merging method and device based on intelligent distribution - Google Patents

Storage system file merging method and device based on intelligent distribution Download PDF

Info

Publication number
CN113722072A
CN113722072A CN202111074845.1A CN202111074845A CN113722072A CN 113722072 A CN113722072 A CN 113722072A CN 202111074845 A CN202111074845 A CN 202111074845A CN 113722072 A CN113722072 A CN 113722072A
Authority
CN
China
Prior art keywords
water level
cache pool
pool
end service
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111074845.1A
Other languages
Chinese (zh)
Other versions
CN113722072B (en
Inventor
杨宁
周文明
曹羽中
魏洪锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huarui Index Cloud Technology Shenzhen Co ltd
Original Assignee
Huarui Index Cloud Henan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huarui Index Cloud Henan Technology Co ltd filed Critical Huarui Index Cloud Henan Technology Co ltd
Priority to CN202111074845.1A priority Critical patent/CN113722072B/en
Publication of CN113722072A publication Critical patent/CN113722072A/en
Application granted granted Critical
Publication of CN113722072B publication Critical patent/CN113722072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of storage, and particularly relates to a storage system file merging method and device based on intelligent distribution. The method compares the similarity of the near-term front-end service pressure and the historical synchronous front-end service pressure under the conditions that the water level of the cache pool is higher and the load of the data pool is moderate, predicts the water level change trend of the historical synchronous cache pool under the condition that the similarity is higher, and further determines whether to shunt according to the prediction result; and under the condition of low similarity, the current water level change trend of the cache pool is predicted, and whether the current water level change trend is shunted or not is further determined according to the prediction result, so that the occurrence of system performance bottleneck is prevented, the smooth and fluent operation of merging service is ensured, the performance of the NVMe SSD can be achieved even if a cheaper SATA SSD or SAS SSD is adopted, and the effective cost reduction is realized. And the bandwidth of the shunting is automatically adjusted according to the set shunting proportion during shunting, so that the stability of time delay of the front-end service in a high-pressure scene is ensured.

Description

Storage system file merging method and device based on intelligent distribution
Technical Field
The invention belongs to the technical field of storage, and particularly relates to a storage system file merging method and device based on intelligent distribution.
Background
In various current distributed storage products, storage of a large number of small files is a technical problem in the industry. The main technical difficulties are three: firstly, a large number of small IO are frequently generated on a storage medium in the storage process of a large number of small files, so that the storage performance is low; the actual space occupied by the small files on the storage medium is much larger than that of the small files, so that the storage space is wasted; and the full-text retrieval performance is low after the mass small files are stored.
In order to solve the first two problems, storage manufacturers in the industry have introduced their own small file merging schemes. The mainstream scheme is that a cache pool (high performance and low capacity) created based on the SSD storage medium is added in front of a data pool, and after data is merged in the cache pool, the data is migrated to the data pool (low performance and high capacity) created based on the HDD storage medium. The specific implementation mainly includes two types, namely an online merging scheme and an offline merging scheme.
The schematic diagram of the online merging scheme is shown in fig. 1, and may be simply summarized as that the small files are received to the storage system and then merged in the memory, then the merged large file is written into the cache pool, and then an asynchronous task is started in the background, and the large file is read from the cache pool and then written into the data pool.
The schematic diagram of the offline merging scheme is shown in fig. 2, and may be briefly summarized as that small files are written into the cache pool in the sequence of writing the small files into the storage system, then an asynchronous task is started in the background, a plurality of small files are read from the cache pool, and after the small files are merged into one large file in the memory, the large file is written into the data pool.
In the online merging scheme, when data falls on an SSD storage medium, the data is already merged into a large file, so the writing performance is better; in the offline merging scheme, multiple writes and multiple reads are generated on the SSD medium during data writing and merging, so the writing performance is poor. In contrast, the online merging scheme has certain advantages in performance compared with the offline merging scheme, but when the capacity of the cache pool is small and the capacity of the data pool is large, the total capacity of the SSD cache medium is much smaller than the total capacity of the main storage medium of the HDD, especially when a large-capacity hard disk and a high-density server are mostly used in the current mainstream distributed storage system, the total bandwidth of the SSD is much smaller than the total bandwidth of the HDD, the total bandwidth that the cache pool can provide is limited, and the SSD is easily a performance bottleneck of the whole system, so that the performance of the hardware cannot be effectively exerted, and the front-end service is blocked.
Disclosure of Invention
The invention provides a storage system file merging method and device based on intelligent distribution, which are used for solving the problem that the prior art blocks front-end services.
In order to solve the technical problems, the technical scheme and the corresponding beneficial effects of the technical scheme are as follows:
the invention relates to a storage system file merging method based on intelligent distribution, which comprises the following steps:
1) under the condition that small files need to be combined, acquiring the water level of a cache pool and the load of a data pool, and judging whether the water level of the cache pool exceeds a water level threshold value and whether the load of the data pool exceeds a load threshold value; the load of the data pool is the utilization rate of a storage medium in the data pool, the water level of the cache pool is the total capacity of dirty data in the cache pool/the total capacity of the cache pool, and the dirty data is data written into the cache pool but not migrated to the data pool;
2) under the condition that the water level of the cache pool exceeds a water level threshold and the load of the data pool does not exceed a load threshold, acquiring the recent front-end service pressure, and comparing the similarity of the recent front-end service pressure with the historical front-end service pressure in the same period:
if the similarity between the recent front-end service pressure and the historical contemporaneous front-end service pressure is smaller than or equal to a similarity threshold, predicting the current cache pool water level change trend based on the recent cache pool water level, and determining whether to perform shunting processing or non-shunting processing according to the prediction result;
if the similarity between the recent front-end service pressure and the historical synchronous front-end service pressure is greater than a similarity threshold, predicting the change trend of the water level of the historical synchronous cache pool based on the water level of the historical synchronous cache pool, and determining whether to perform shunting processing or non-shunting processing according to the prediction result;
the front-end service pressure is the data volume written into the cache pool by the front-end service every second; the non-shunting processing means that the small files are directly placed in a cache pool to be merged; the shunting processing refers to that small files are placed in a cache pool and a data pool according to the set shunting proportion to be merged.
The beneficial effects of the above technical scheme are: the method compares the similarity of the near-term front-end service pressure and the historical synchronous front-end service pressure under the conditions that the water level of the cache pool is higher and the load of the data pool is moderate, predicts the water level change trend of the historical synchronous cache pool under the condition that the similarity is higher, and further determines whether to shunt according to the prediction result; and predicting the current cache pool water level change trend under the condition of low similarity, and further determining whether to shunt according to the prediction result, thereby realizing an intelligent shunting method which comprehensively utilizes the similarity between the recent front-end service pressure and the historical synchronous front-end service pressure, the prediction result for predicting the historical synchronous cache pool water level change trend and the prediction result for predicting the current cache pool water level change trend, preventing the occurrence of system performance bottleneck, ensuring the smooth and fluent operation of merging service, ensuring the time delay stability of the front-end service in a large-pressure scene, achieving the performance of NVMe SSD even if cheaper SATA SSD or SAS SSD is adopted, and effectively reducing the cost. And the bandwidth of the shunting is automatically adjusted according to the set shunting proportion during shunting, so that the stability of time delay of the front-end service in a high-pressure scene is further ensured.
Further, in step 2), if the prediction result of predicting the water level variation trend of the cache pool in the same period of history is an ascending trend, the flow distribution processing is performed.
Further, in step 2), if the prediction result of predicting the water level variation trend of the cache pool in the same period of history is a descending trend, temporarily not shunting, predicting the current water level variation trend of the cache pool based on the recent water level of the cache pool, and determining whether to perform shunting processing or non-shunting processing according to the prediction result.
Further, in the step 2), if the prediction result for predicting the current water level variation trend of the cache pool is an ascending trend, performing shunting processing; and if the prediction result of predicting the current cache pool water level change trend is a descending trend, performing non-shunting processing.
Further, in step 1), if the water level of the cache pool does not exceed the water level threshold or the load of the data pool exceeds the load threshold, performing non-shunting processing.
Further, in order to accurately calculate the similarity between the recent front-end service pressure and the historical contemporaneous front-end service pressure, a similarity calculation model based on a pearson correlation coefficient and/or a Euclidean distance algorithm is used for comparing the similarity between the recent front-end service pressure and the historical contemporaneous front-end service pressure.
Furthermore, in order to accurately predict the current cache pool water level variation trend and the historical contemporaneous cache pool water level variation trend, a time series analysis model is utilized to predict the historical contemporaneous cache pool water level variation trend based on the historical contemporaneous cache pool water level or predict the current cache pool water level variation trend based on the recent cache pool water level; the time series analysis model is obtained by training by utilizing the water level change trend data of the historical cache pool.
Further, in order to accurately calculate a shunt evaluation model to ensure the stability of the time delay of the front-end service in a high-pressure scene, the shunt ratio is calculated by using the shunt evaluation model, the current front-end service pressure, the current cache pool water level and the current data pool load; the input of the shunting evaluation model is the product of front-end service pressure and corresponding weight, the product of cache pool water level and corresponding weight, and the product of data pool load and corresponding weight, the output of the shunting evaluation model comprises a shunting proportion, and the shunting evaluation model is obtained by training historical front-end service pressure, historical cache pool water level, historical data pool load, and the current shunting proportion of the historical front-end service pressure, historical cache pool water level and historical data pool load.
Further, in order to ensure the stability of the system, before the step 1), the following steps are further included: calculating the maximum front-end service pressure which can be currently borne in real time, and judging whether the current front-end service pressure exceeds the maximum front-end service pressure: if the maximum front-end service pressure is exceeded, triggering a front-end service flow control mechanism to limit the write-in bandwidth of the front-end service; if the maximum front-end traffic pressure is not exceeded, step 1) is performed.
The storage system file merging device based on intelligent shunting comprises a memory and a processor, wherein the processor is used for executing instructions stored in the memory to realize the storage system file merging method based on intelligent shunting, and the same beneficial effects as the method are achieved.
Drawings
FIG. 1 is a schematic diagram of a prior art on-line consolidation scheme;
FIG. 2 is a schematic diagram of a prior art offline consolidation scheme;
FIG. 3 is a schematic diagram of the structure of the file merge system of the present invention;
FIG. 4 is a schematic structural diagram of an intelligent shunting module of the present invention;
FIG. 5 is a flow chart of a storage system file merging method based on intelligent splitting according to the present invention;
fig. 6 is a structural diagram of a storage system file merging device based on intelligent splitting according to the present invention.
Detailed Description
The basic concept of the invention is as follows: the invention shares the small file merging pressure of the cache pool by using the data pool, so that part of small files are directly merged in the data pool without passing through the cache pool, namely the 'shunting' mentioned in the contents of each paragraph. Based on the above, under the condition that the water level of the cache pool exceeds the water level threshold and the load of the data pool does not exceed the load threshold, the method comprehensively utilizes the similarity between the recent front-end service pressure and the historical synchronous front-end service pressure, the prediction result for predicting the historical synchronous cache pool water level change trend and the prediction result for predicting the current cache pool water level change trend to determine whether to carry out shunting treatment; and the shunting processing is to reasonably distribute the shunting proportion according to the set shunting proportion into the proportion of merging in the cache pool and merging in the data pool.
The following describes a storage system file merging method based on intelligent splitting and a storage system file merging device based on intelligent splitting in detail with reference to the accompanying drawings and embodiments.
The method comprises the following steps:
in order to implement the storage system file merging method based on intelligent splitting of the present invention, a file merging system as shown in fig. 3 is designed, where the file merging system includes a service detection module, a cache pool merging module, a data pool merging module, a small file migration module, a key index monitoring module, an intelligent flow control module, an intelligent splitting module, and a metadata management module. These modules are all software modules. The function of each module and the data interaction between the modules are explained in detail below.
(1) And a service request detection module. The module is used for detecting each request in real time and judging whether the files need to be combined according to a preset strategy.
(2) And a cache pool merging module. The module is used for merging the small files in the cache pool. If the small files need to be merged in the cache pool, the module mounts the small files into a memory queue corresponding to the cache pool, and after enough small files exist in the queue, the small files on the queue are merged into a large file and written into an SSD storage medium corresponding to the cache pool.
(3) And a data pool merging module. The module is used for merging the small files in the data pool. If the small files need to be directly merged in the data pool, the module mounts the small files into a memory queue corresponding to the data pool, and after enough small files exist in the queue, the small files on the queue are merged into a large file and written into an HDD storage medium corresponding to the data pool.
(4) And a small file migration module. The module is used for periodically reading out large files in the cache pool from the SSD storage medium and then writing the large files into the data pool.
(5) And a key index monitoring module. The module is used for monitoring and calculating key indexes, wherein the key indexes comprise front-end service pressure, cache pool water level and data pool load. The front-end service pressure refers to the amount of data written into the cache pool by the front-end service per second, namely the data bandwidth. The cache pool water level refers to the proportion of dirty data in the cache pool, namely the percentage of the capacity of the dirty data to the total capacity of the cache pool; the dirty data refers to data that has just been written into the cache pool and has not been migrated to the data pool. The data pool load refers to the utilization rate of the storage medium in the data pool. The key indexes calculated by the module are used by the intelligent flow control module and the intelligent shunting module to determine whether to shunt and how to shunt.
(6) And an intelligent flow control module. The module calculates the maximum service bandwidth which can be currently carried by the system in real time according to various key indexes calculated by the key index monitoring module, and if the maximum service bandwidth exceeds the maximum capacity of the system, a front-end service flow control mechanism is triggered to limit the write-in bandwidth of the front-end service so as to ensure the stability of the system.
(7) Intelligent shunting module. The main functions of the module are: when the front-end service pressure is too large, so that the cache pool reaches the performance bottleneck, the module transfers part of the front-end service pressure to the data pool according to various key indexes calculated by the key index monitoring module, and the action of merging the small files is directly finished on the data pool without passing through the cache pool. The structure of the intelligent shunting module is shown in fig. 4, and includes a trend prediction unit, a machine learning unit, a shunting proportion calculation unit, and a model configuration unit. The function of each unit will be described in detail below.
A trend prediction unit. The unit is used for predicting the trend of the water level change in the data pool. The unit adopts a trend prediction algorithm based on time series analysis, the main body adopts an ARIMA differential autoregressive moving average model to analyze the write-in bandwidth of the front-end service, the pressure of the front-end service is decomposed into a trend part, a period part and a residual sequence, and the analysis result of the trend part and the load of a data pool are utilized to judge whether the shunting is necessary. The period part in the algorithm refers to a periodic variation rule generated when the front-end service pressure is reflected on a monitoring index of the cache pool, wherein the periodicity may be in a unit of day or a unit of week or month, for example, the front-end service pressure reaches a peak value at the working peak period of 8 points earlier every day, and the front-end service pressure reaches the lowest value at the working peak period of 6 points later every day. The trend part in the algorithm refers to the trend of the front-end traffic pressure to rise, fall, and fluctuate up and down within a certain range. The residual sequence in the algorithm is a sequence obtained by subtracting a fitting sequence on the training data from an original sequence of the training data. The better the sequence fits to the random error distribution (normal distribution with mean 0), the better the model fits. The module has as its inputs a set of time series data consisting of { time 1, level 1 of cache pool }, { time 2, level 2 of cache pool }, { time 3, level 3 of cache pool } …, etc. The output item of the unit is the trend of the buffer pool water level, namely { rising, required time, confidence } or { falling, required time, confidence } or { holding, duration, confidence }.
And the machine learning unit. This unit has two functions. The first function is to perform summary analysis on the historical data to summarize a shunting evaluation model. The input items of the shunting evaluation model are { front-end service pressure and front-end service pressure weight, cache pool water level and data pool load weight }, and the output items are { shunting proportion, time required for the cache pool water level to fall below a safety threshold value and confidence }; and the shunt evaluation model is obtained by training by utilizing historical front-end service pressure, historical cache pool water level and historical data pool load as well as the shunt proportion of the historical front-end service pressure, the historical cache pool water level and the historical data pool load at that time. The values of the weights mainly take the influence degree of each factor on the prediction result and the stability of the factor into consideration. Stability here refers to the degree of randomness with which this factor varies. In general, the weights of the various factors may be set as follows: the weight of the front-end service pressure is 30%, the weight of the water level of the cache pool is 40%, and the weight of the load of the data pool is 40%, and the value can be adjusted according to the actual situation. The second function is to provide a similarity calculation model based on the Pearson correlation coefficient and Euclidean distance algorithm, which is used for comparing the current front-end service pressure with the historical synchronous front-end service pressure as the reference data for shunting or not. The input items of the similarity calculation model are two groups of time series data, wherein one group of the time series data is historical data, namely { historical time 1, historical service pressure 1}, { historical time 2, historical service pressure 2}, { historical time 3, historical service pressure 3} …, the other group of the time series data is recent data, namely { recent time 1, recent service pressure 1}, { recent time 2, recent service pressure 2}, { time 3, recent service pressure 3} …, and the output item is { similarity percentage of the two groups of data }, and when the similarity is more than 95%, the similarity between the current front-end service pressure and the historical service pressure is considered to be high.
And the shunt ratio calculating unit. And automatically calculating an optimal shunt ratio based on analysis results given by the trend prediction unit, the machine learning unit and the model configuration unit. The calculation flow of the unit is as follows: firstly, judging whether the current service pressure is similar to the pressure of a certain historical time period according to a similarity calculation method of a machine learning module: if the similarity is higher, shunting according to the historical shunting proportion; if not, judging whether the future has an ascending trend according to the calculation result of the trend prediction module, and if so, shunting by adopting the shunting proportion calculated by the machine learning module. The detailed process is shown in fig. 5.
And fourthly, a model configuration unit. The unit provides methods for manually adjusting parameters for the trend prediction unit and the machine learning unit, also supports manual marking of the business wave peak and trough of holidays and the business wave peak and trough of working and working hours, and can effectively supplement the algorithm of the machine learning unit.
(8) And a metadata management module. The module is responsible for recording key information such as the position and the size of each small file, mapping information from the small file to the large file, position information of the large file, hole information generated in the large file after the small file is deleted and the like.
Based on the file merging system introduced above, the storage system file merging method based on intelligent splitting of the present invention can be realized. The entire process is described below in conjunction with fig. 5.
Step one, in the working process of the system, a key index monitoring module is used for monitoring and calculating key indexes (including front-end service pressure, cache pool water level and data pool load) in the system in real time, and an intelligent flow control module calculates the maximum service bandwidth which can be currently borne by the system according to the key indexes calculated by the key index monitoring module in real time, and judges the maximum service bandwidth: if the current front-end service pressure exceeds the maximum front-end service pressure, triggering a front-end service flow control mechanism, and limiting the write-in bandwidth of the front-end service to ensure the stability of the system; and if the current front-end service pressure does not exceed the maximum front-end service pressure, detecting each request in real time through a service request detection module, judging whether the files need to be combined according to a preset strategy under the condition that the requests exist, if the files need to be combined, executing the step two, and if the files do not need to be combined, directly ending the step.
Step two, judging the water level of the cache pool calculated by the key index monitoring module, and judging whether the water level of the cache pool exceeds a water level threshold value:
if the water level of the cache pool does not exceed the water level threshold, carrying out non-shunting processing and executing the step eight;
and if the water level of the cache pool exceeds the water level threshold, executing the step three.
Step three, judging the data pool load calculated by the key index monitoring module, and judging whether the data pool load exceeds a load threshold value:
if the load of the data pool exceeds the load threshold, carrying out non-shunting processing and executing the step eight;
and if the load of the data pool does not exceed the load threshold, executing a step four.
Step four, judging whether the recent front-end service pressure (namely the front-end service pressure in a few periods) is similar to the historical front-end service pressure in the same period or not through a machine learning unit in the intelligent shunting module:
if the output result of the machine learning unit is that the front-end service pressure in the last several periods is not similar to the front-end service pressure in the historical synchronization (namely the similarity is less than or equal to the similarity threshold), executing a fifth step;
if the output result of the machine learning unit is that the front-end service pressure in the last several cycles is similar to the front-end service pressure in the historical synchronization (i.e. the similarity is greater than the similarity threshold), step six is executed.
And step five, under the condition that the front-end service pressure in nearly several periods is not similar to the historical front-end service pressure in the same period, predicting the current cache pool water level change trend through a trend prediction unit based on the recent cache pool water level data:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunting processing and executing the step nine;
and if the prediction result of the trend prediction unit is a descending trend, performing non-shunting processing and executing the step eight.
Step six, under the condition that the front-end service pressure in the last several periods is similar to the front-end service pressure in the historical synchronization, predicting the water level change trend of the cache pool in the historical synchronization through a trend prediction unit based on the water level data of the cache pool in the historical synchronization:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunting processing and executing the step nine;
and if the prediction result of the trend prediction unit is a descending trend, temporarily not shunting, and executing the step seven.
And step seven, predicting the current cache pool water level change trend through a trend prediction unit based on the recent cache pool water level data:
if the prediction result of the trend prediction unit is an ascending trend, carrying out shunting processing and executing the step nine;
and if the prediction result of the trend prediction unit is a descending trend, performing non-shunting processing and executing the step eight.
Step eight, non-shunting processing is carried out, firstly, a shunting proportion is calculated through a machine learning unit, the specific mode is that the product of front-end service pressure and corresponding weight, the product of cache pool water level and corresponding weight and the product of data pool load and corresponding weight are calculated, three product results are input into a shunting evaluation model, a shunting proportion can be obtained, partial small files are mounted into a memory queue corresponding to the cache pool through a cache pool combining module by utilizing the shunting proportion, after enough small files exist in the queue, a plurality of small files on the queue are combined into a large file and written into an SSD storage medium corresponding to the cache pool, and the large file in the cache pool is read out periodically by a small file migration module and written into the data pool; and the other part of small files are mounted into a memory queue corresponding to the data pool through a data pool merging module, and after enough small files exist in the queue, a plurality of small files on the queue are merged into one large file and written into an SSD storage medium corresponding to the data pool.
And step nine, under the condition of shunting, firstly mounting the small files into a memory queue corresponding to the data pool through a data pool merging module, merging a plurality of small files on the queue into one large file after enough small files exist in the queue, and writing the large file into an HDD storage medium corresponding to the data pool.
Therefore, the storage system file merging method based on intelligent distribution can be completed. In a whole view, the file merging system and the file merging method implemented by the file merging system have the following characteristics:
1) the intelligent distribution system is provided with an intelligent distribution module, a trend prediction unit in the intelligent distribution module can realize the prediction of the water level change trend, a machine learning unit can calculate the distribution proportion to perform distribution processing according to the calculated distribution proportion and can also perform similarity calculation to realize the comparison of the current front-end service pressure and the historical synchronous front-end service pressure, intelligent distribution can be realized by utilizing the data, the distribution bandwidth proportion is automatically adjusted, and the stability of time delay of the front-end service in a large-pressure scene is ensured.
2) The shunting method can achieve the performance of NVMe SSD even if cheaper SATA SSD or SAS SSD is adopted, so that the cost can be effectively reduced, and the hardware performance can be effectively exerted.
The embodiment of the device is as follows:
an embodiment of the storage system file merging device based on intelligent splitting according to the present invention is shown in fig. 6, and includes a memory, a processor, and an internal bus, where the processor and the memory complete mutual communication and data interaction through the internal bus. The storage comprises at least one software functional module stored in the storage, and the processor executes various functional applications and data processing by running the software programs and modules stored in the storage, so as to implement the storage system file merging method based on intelligent splitting described in the method embodiment of the present invention.
The processor can be a microprocessor MCU, a programmable logic device FPGA and other processing devices. The memory can be various memories for storing information by using an electric energy mode, such as RAM, ROM and the like; various memories for storing information by magnetic energy, such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, U disk, etc.; various memories for storing information optically, such as CDs, DVDs, etc.; of course, other forms of memory are possible, such as quantum memory, graphene memory, and the like.

Claims (10)

1. A storage system file merging method based on intelligent distribution is characterized by comprising the following steps:
1) under the condition that small files need to be combined, acquiring the water level of a cache pool and the load of a data pool, and judging whether the water level of the cache pool exceeds a water level threshold value and whether the load of the data pool exceeds a load threshold value; the load of the data pool is the utilization rate of a storage medium in the data pool, the water level of the cache pool is the total capacity of dirty data in the cache pool/the total capacity of the cache pool, and the dirty data is data written into the cache pool but not migrated to the data pool;
2) under the condition that the water level of the cache pool exceeds a water level threshold and the load of the data pool does not exceed a load threshold, acquiring the recent front-end service pressure, and comparing the similarity of the recent front-end service pressure with the historical front-end service pressure in the same period:
if the similarity between the recent front-end service pressure and the historical contemporaneous front-end service pressure is smaller than or equal to a similarity threshold, predicting the current cache pool water level change trend based on the recent cache pool water level, and determining whether to perform shunting processing or non-shunting processing according to the prediction result;
if the similarity between the recent front-end service pressure and the historical synchronous front-end service pressure is greater than a similarity threshold, predicting the change trend of the water level of the historical synchronous cache pool based on the water level of the historical synchronous cache pool, and determining whether to perform shunting processing or non-shunting processing according to the prediction result;
the front-end service pressure is the data volume written into the cache pool by the front-end service every second; the non-shunting processing means that the small files are directly placed in a cache pool to be merged; the shunting processing refers to that small files are placed in a cache pool and a data pool according to the set shunting proportion to be merged.
2. The method for merging files in a storage system based on intelligent splitting according to claim 1, wherein in step 2), splitting is performed if a prediction result of predicting the water level change trend of the cache pool in the same history period is an ascending trend.
3. The method for merging files in a storage system based on intelligent splitting according to claim 1, wherein in step 2), if a prediction result of predicting the change trend of the water level of the cache pool in the historical period is a descending trend, splitting is temporarily not performed, the current change trend of the water level of the cache pool is predicted based on the recent water level of the cache pool, and splitting processing or splitting processing is determined according to the prediction result.
4. The method for merging the files in the storage system based on the intelligent distribution according to the claim 1 or 3, wherein in the step 2), if the prediction result for predicting the current trend of the water level of the cache pool is an ascending trend, the distribution processing is performed; and if the prediction result of predicting the current cache pool water level change trend is a descending trend, performing non-shunting processing.
5. The method for merging files in a storage system based on intelligent splitting according to claim 1, wherein in step 1), if the water level of the cache pool does not exceed the water level threshold or the load of the data pool exceeds the load threshold, the non-splitting is performed.
6. The intelligent offload based storage system file merging method of claim 1, wherein a similarity calculation model based on pearson correlation coefficient and/or euclidean distance algorithm is used to compare the near-term front-end traffic pressure with the historical contemporaneous front-end traffic pressure.
7. The intelligent splitting-based storage system file merging method according to claim 1, wherein a time series analysis model is used to predict the historical contemporaneous cache pool water level variation trend based on the historical contemporaneous cache pool water level or predict the current cache pool water level variation trend based on the recent cache pool water level; the time series analysis model is obtained by training by utilizing the water level change trend data of the historical cache pool.
8. The intelligent offload based storage system file consolidation method of claim 1, wherein the offload proportion is calculated using an offload evaluation model, and current front-end traffic pressure, current cache pool water level, and current data pool load; the input of the shunting evaluation model is the product of front-end service pressure and corresponding weight, the product of cache pool water level and corresponding weight, and the product of data pool load and corresponding weight, the output of the shunting evaluation model comprises a shunting proportion, and the shunting evaluation model is obtained by training historical front-end service pressure, historical cache pool water level, historical data pool load, and the current shunting proportion of the historical front-end service pressure, historical cache pool water level and historical data pool load.
9. The method for merging the files in the storage system based on the intelligent splitting according to claim 1, further comprising the following steps before step 1): calculating the maximum front-end service pressure which can be currently borne in real time, and judging whether the current front-end service pressure exceeds the maximum front-end service pressure: if the maximum front-end service pressure is exceeded, triggering a front-end service flow control mechanism to limit the write-in bandwidth of the front-end service; if the maximum front-end traffic pressure is not exceeded, step 1) is performed.
10. An intelligent distribution based storage system file merging device, comprising a memory and a processor, wherein the processor is configured to execute instructions stored in the memory to implement the intelligent distribution based storage system file merging method according to any one of claims 1 to 9.
CN202111074845.1A 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting Active CN113722072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111074845.1A CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111074845.1A CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Publications (2)

Publication Number Publication Date
CN113722072A true CN113722072A (en) 2021-11-30
CN113722072B CN113722072B (en) 2024-02-13

Family

ID=78683638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111074845.1A Active CN113722072B (en) 2021-09-14 2021-09-14 Storage system file merging method and device based on intelligent shunting

Country Status (1)

Country Link
CN (1) CN113722072B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465957A (en) * 2021-12-29 2022-05-10 天翼云科技有限公司 Method and device for writing data
CN114996674A (en) * 2022-06-21 2022-09-02 中银金融科技有限公司 Data processing system and method
CN115291809A (en) * 2022-08-25 2022-11-04 济南浪潮数据技术有限公司 Data brushing speed control method, device and medium
CN117648297A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Method, system, equipment and medium for offline merging of small files based on object storage
CN117909060A (en) * 2023-12-14 2024-04-19 天翼云科技有限公司 Deep learning dynamic weight adjustment method applied to super-fusion storage pool
CN118522435A (en) * 2024-05-10 2024-08-20 北京零美科技有限公司 Wireless medical monitoring vital sign system and monitor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083344A1 (en) * 2007-09-26 2009-03-26 Hitachi, Ltd. Computer system, management computer, and file management method for file consolidation
CN108595567A (en) * 2018-04-13 2018-09-28 郑州云海信息技术有限公司 A kind of merging method of small documents, device, equipment and readable storage medium storing program for executing
CN112631521A (en) * 2020-12-25 2021-04-09 苏州浪潮智能科技有限公司 Method, system, equipment and medium for controlling water level of cache pool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083344A1 (en) * 2007-09-26 2009-03-26 Hitachi, Ltd. Computer system, management computer, and file management method for file consolidation
CN108595567A (en) * 2018-04-13 2018-09-28 郑州云海信息技术有限公司 A kind of merging method of small documents, device, equipment and readable storage medium storing program for executing
CN112631521A (en) * 2020-12-25 2021-04-09 苏州浪潮智能科技有限公司 Method, system, equipment and medium for controlling water level of cache pool

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114465957A (en) * 2021-12-29 2022-05-10 天翼云科技有限公司 Method and device for writing data
CN114465957B (en) * 2021-12-29 2024-03-08 天翼云科技有限公司 Data writing method and device
CN114996674A (en) * 2022-06-21 2022-09-02 中银金融科技有限公司 Data processing system and method
CN115291809A (en) * 2022-08-25 2022-11-04 济南浪潮数据技术有限公司 Data brushing speed control method, device and medium
CN117909060A (en) * 2023-12-14 2024-04-19 天翼云科技有限公司 Deep learning dynamic weight adjustment method applied to super-fusion storage pool
CN117648297A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Method, system, equipment and medium for offline merging of small files based on object storage
CN117648297B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Offline merging method, system, device and medium of small files based on object storage
CN118522435A (en) * 2024-05-10 2024-08-20 北京零美科技有限公司 Wireless medical monitoring vital sign system and monitor

Also Published As

Publication number Publication date
CN113722072B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN113722072A (en) Storage system file merging method and device based on intelligent distribution
CN110413227B (en) Method and system for predicting remaining service life of hard disk device on line
CN105630638B (en) For the apparatus and method for disk array distribution caching
CN105653591B (en) A kind of industrial real-time data classification storage and moving method
CN110096350B (en) Cold and hot area division energy-saving storage method based on cluster node load state prediction
CN103685542B (en) Cloud virtual machine migration method, device and system
CN103399713A (en) Data buffering method for balancing multistage memory property and solid-state disk service life
CN118051190B (en) Data protection method and system for solid state disk
CN118672520B (en) Hierarchical storage method and device for file data, medium and electronic equipment
CN111367469A (en) Layered storage data migration method and system
US12541505B2 (en) Record process storage system and method with automatic buffer interval updates
CN113254256A (en) Data reconstruction method, storage device and storage medium
EP4078380B1 (en) Behavior-driven die management on solid-state drives
CN119961189A (en) A cache management optimization method and system
CN119396346A (en) Method, device, equipment and storage medium for disk space management
CN118069658A (en) A control method for a database system and a database system
CN120428922B (en) A disk scheduling method
CN113515238B (en) Data scheduling method and system based on hierarchical storage and electronic equipment
CN118838759B (en) A computing power data operation and maintenance processing method and system based on AI intelligence
Ha et al. Dynamic hot data identification using a stack distance approximation
CN120492497A (en) Cache management method, device, system and storage medium
CN119148950A (en) Solid state disk storage performance optimization method and system based on load balancing
CN117950583A (en) Cloud hard disk data guaranteeing method and system based on adaptive prediction of usable time
US20230325257A1 (en) Workload measures based on access locality
CN115185766A (en) Comprehensive prediction method and system for memory energy consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221011

Address after: 518071 4011, Block A, Zhongguan Times Square, No. 4168, Liuxian Avenue, Pingshan Community, Taoyuan Street, Nanshan District, Shenzhen, Guangdong

Applicant after: Huarui Index Cloud Technology (Shenzhen) Co.,Ltd.

Address before: 471399 No. 1, Huali electronic technology Animation Industrial Park, Hebin sub district office, Yichuan County, Luoyang City, Henan Province

Applicant before: Huarui index cloud (Henan) Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant