CN106648469B - Cache data processing method and device and storage controller - Google Patents

Cache data processing method and device and storage controller Download PDF

Info

Publication number
CN106648469B
CN106648469B CN201611248587.3A CN201611248587A CN106648469B CN 106648469 B CN106648469 B CN 106648469B CN 201611248587 A CN201611248587 A CN 201611248587A CN 106648469 B CN106648469 B CN 106648469B
Authority
CN
China
Prior art keywords
data
target stripe
stripe
cache
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611248587.3A
Other languages
Chinese (zh)
Other versions
CN106648469A (en
Inventor
江武汉
门勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201611248587.3A priority Critical patent/CN106648469B/en
Publication of CN106648469A publication Critical patent/CN106648469A/en
Priority to PCT/CN2017/118147 priority patent/WO2018121455A1/en
Application granted granted Critical
Publication of CN106648469B publication Critical patent/CN106648469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements

Abstract

A data processing method, and corresponding apparatus and memory controller are disclosed. Wherein, the dataThe processing method comprises the following steps: acquiring a compression ratio D, a deduplication ratio C, an average stripe filling rate I, a target stripe size S in a cache and an effective data size W in the target stripe of data; according to the obtained parameters, using formula
Figure DDA0001197606350000011
Acquiring a eliminated data quantity N, and executing one of the following operations according to the value of N: at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold; at N > TmaxWriting data of size N into the target stripe; at N < TminIn case of (2), evicting the target stripe data to the storage medium. The time point of the cache data downloading is more reasonably designed by using the scheme.

Description

Cache data processing method and device and storage controller
Technical Field
The invention relates to the storage technology, in particular to the technical field of logs in the storage field.
Background
Cache (cache) is an important technology used for solving the problem of speed mismatch between high-speed and low-speed devices, is widely applied to various fields such as storage systems, and can reduce application response time and improve efficiency.
In a storage system, in order to improve the performance of the storage system, a cache layer (also called write-back cache) is usually added between a processing device and a hard disk. The cache layer may improve system read and write performance, and is sometimes referred to as a performance layer.
The medium used by the cache is, for example, a Solid State Disk (SSD), and the read/write performance of the SSD is better than that of a persistent storage medium (e.g., a hard disk). Write requests from the host application can be processed more quickly than with a hard disk. The storage capacity of the cache is limited, so when the free space is insufficient, a part of cache medium data needs to be eliminated to the hard disk according to an elimination algorithm so as to free up space for subsequent write requests. This portion of data sent from the cache to the hard disk is called dirty data.
In order to make the cache have enough free space as soon as possible, dirty data needs to be eliminated from the hard disk as soon as possible when the free space is insufficient. Otherwise, the write request is difficult to be processed in time.
In the storage field, dirty data in a cache is usually sent to a hard disk with a granularity of striping. Striping is a logical concept, similar to a container of data. If the delivered data is less than one stripe, the fact that only part of the position valid data exists in the stripe means that the stripe has a free position. See the bars in fig. 1, the utilization is only 50%. Such a stripe that is not "filled" with valid data, as well as a stripe that is "filled" with valid data, occupies a complete stripe of storage space in the hard disk and generates a set of metadata. Thus, if the ratio of valid data to free locations in the stripe is larger. The more disk storage space can be saved and the amount of metadata can be reduced. If the strip is completely 'filled' with valid data and then is issued, the storage space of the hard disk can be maximally utilized, and the minimum metadata can be generated.
Therefore, two conflicting requirements exist for the elimination mode of dirty data: considering the requirement of acquiring more free cache spaces as soon as possible, dirty data needs to be eliminated to a hard disk as soon as possible, and preferably, once the free spaces are insufficient, elimination operation is executed immediately; in consideration of the demand of improving the utilization rate of the hard disk and reducing the metadata, the dirty data needs to be eliminated to the hard disk after being fully fragmented as much as possible, and preferably, the elimination operation is executed after the dirty data is fully fragmented. How to find a balance between the two is a problem to be solved by the industry.
The prior art does not solve this problem well. The jitter of the read/write times per second (IOPS) is too large, and the storage space of the hard disk is not fully utilized.
Disclosure of Invention
Embodiments of the present invention provide a cache data processing method, an apparatus, and a storage controller, which can alleviate severe fluctuation of an IOPS, so that the more stable the performance of a storage system is.
In a first aspect of the present invention, a method for processing cache data is provided, where the method includes: acquiring a compression ratio D, a deduplication ratio C, an average stripe filling rate I, a target stripe size S in a cache and an effective data size W in the target stripe of data; according to the formula
Figure BDA0001197606330000021
Figure BDA0001197606330000022
Obtaining eliminated data quantity N at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold. In another alternative, where N > TmaxIn the case of (2), writing data of size N to the target stripe. In another alternative, where N < TminIn case of (2), evicting the target stripe data to the storage medium. All three optional modes are optional, and a complete flow can be formed by executing any one of the three optional modes. Based on the method, under the condition of not obviously influencing the cache, the data which are contained as much as possible are eliminated in the storage medium in a striping mode. A balance is struck between conflicting requirements of obtaining free cache space as early as possible and improving disk utilization. The cached data is more smoothly eliminated from the storage medium, the condition of serious IOPS fluctuation is reduced, and the storage system is more stable.
In a first possible implementation manner of the first aspect, after writing the data of size N into the target stripe, the method further includes: and eliminating the target stripe to the storage medium after the target stripe is fully written with subsequent data. This possible implementation proposes N > TmIn case of (2), the scheme of the data is eliminated.
In a second possible implementation manner of the first aspect, according to the cache data eviction method in claim 1, the D, C and I are statistical values, and the initial values are both 1.
In a third possible implementation form of the first aspect, S is described by a number of minimum units (or integer multiples of the minimum units) to download data from the cache to the storage medium, for example by a number of cache pages.
In a second aspect of the present invention, a device for processing cache data is provided, the device comprising: the acquisition module can be used for acquiring the compression ratio D, the deduplication ratio C and the average stripe filling rate I of data, and also can be used for acquiring the size S of a target stripe in a cache and the size W of effective data in the target stripe; a calculation module for following the formula
Figure BDA0001197606330000031
Calculating the eliminated data quantity N; a data processing module for processing data at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold. In another alternative, the data processing module is configured to process the data when N > TmaxIn the case of (2), writing data of size N to the target stripe. In another alternative, where N < TminIn this case, the data processing module is configured to eliminate the target striped data to the storage medium. All three optional modes are optional, and a complete flow can be formed by executing any one of the three optional modes. Based on the cache data processing device, under the condition that cache is not obviously influenced, the data which are contained as much as possible in strips are eliminated in the storage medium. A balance is struck between conflicting requirements of obtaining free cache space as early as possible and improving disk utilization. The cached data is more smoothly eliminated from the storage medium, the condition of serious IOPS fluctuation is reduced, and the storage system is more stable.
In a first possible implementation manner of the second aspect of the present invention, the processing module is further configured to: after data of size N is written to the target stripe, if the target stripe is full with subsequent data,and eliminating the target strips to the storage medium. This possible implementation proposes N > TmIn case of (2), the scheme of the data is eliminated.
In a second possible implementation manner of the second aspect of the present invention, the D, C and I are statistical values, and the initial values thereof are both 1.
In a third possible implementation of the second aspect of the invention, S is described by the number of smallest units (or integer multiples of the smallest units) to download data from the buffer to the storage medium, for example by the number of buffer pages.
A third aspect of the present invention provides a storage controller, where the storage controller includes a processor and a cache, where the cache is used to temporarily store data, and optionally may further include a storage medium (e.g., a hard disk or a solid state disk) for storing obsolete data. The processor executes the following steps by executing a program: the method comprises the steps of obtaining a compression ratio D, a deduplication ratio C and an average stripe filling rate I of data, and also obtaining a target stripe size S in a cache and an effective data size W in the target stripe; according to the formula
Figure BDA0001197606330000041
Figure BDA0001197606330000042
Acquiring elimination data quantity N; at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold. In another alternative, where N > TmaxIn the case of (2), writing data of size N to the target stripe. In another alternative, where N < TminIn case of (2), evicting the target stripe data to the storage medium. All three optional modes are optional, and a complete flow can be formed by executing any one of the three optional modes. Based on the storage controller, under the condition of not obviously influencing the cache, the data which are contained as much as possible in a stripe mode are eliminated to the storage medium. Obtaining free buffers as early as possibleA balance is struck between the conflicting requirements of space and increased disk utilization.
In a first possible implementation manner of the third aspect of the present invention, the processor is further configured to perform: after data of size N is written into the target stripe, if the target stripe is full of subsequent data, the target stripe is evicted to the storage medium. This possible implementation proposes N > TmIn case of (2), the scheme of the data is eliminated.
In a second possible implementation manner of the third aspect of the present invention, the D, C and I are statistical values, and the initial values thereof are both 1.
In a third possible implementation of the third aspect of the invention, S is described by the number of smallest units (or integer multiples of the smallest units) to download data from the buffer to the storage medium, for example by the number of buffer pages.
In a fourth aspect of the present invention, a storage system is further provided, which includes the storage controller described above and a storage medium (e.g., a hard disk, a solid state disk), where the storage medium is used to store obsolete data.
In a fifth aspect of the present invention, a storage system is further provided, which includes the above data processing apparatus and a storage medium (e.g., a hard disk, a solid state disk), where the storage medium is used to store obsolete data.
Drawings
FIG. 1 is a schematic illustration of a strip use case;
FIG. 2 is a block diagram of an embodiment of the memory system of the present invention;
FIG. 3 is a flow chart of an embodiment of a method for processing cache data according to the present invention;
FIG. 4 is a topology diagram of an embodiment of a cache data processing apparatus.
Detailed Description
The scheme of the application can be applied to the storage device. The storage device is, for example, a storage controller. The storage controller and the storage medium (e.g., hard disk, solid state disk, etc.) are relatively independent and together form a storage system. Unless otherwise stated, the following embodiments describe the embodiments of the present invention by taking such a case as an example, and the storage medium is exemplified by a hard disk.
It should be noted that the storage device may also be a storage device having both operation management capability and a storage medium, such as a server. The portion of the server having the operation management capability corresponds to the storage controller. The storage controller communicates with a storage medium internal to the storage device or with a storage medium external to the storage device. Because of the similar principle, it will not be described in detail.
Fig. 2 is a block diagram of an embodiment of the memory system of the present invention. The host 11 communicates with a storage system 12, which includes a storage controller 12 and a storage medium 13. The storage controller 12 includes a processor 121, a cache 122, and a memory 123. The memory 123 may provide a storage space for the processor 121 to execute a program, and by executing the program, the processor may execute the embodiment of the method for processing cache data provided by the present invention. Memory 123 and cache 122 may be integrated or may be separate. The storage controller 12 and the asset medium 13 are shown as separate. In other embodiments, the storage controller 12 may also have integrated therein a storage medium 13, such as a general purpose server.
The cache may be read and written by the host 11. Since the buffer space is limited, when the buffer space becomes small, the data in the buffer needs to be migrated to the storage medium 13. Specifically, the stripe valid data in the cache 121 is migrated to the storage medium 13 in stripe units, and this migration is also called a footage, and the earlier the footage is, the earlier the cache can vacate a void space.
In the block storage technology, the cache data is recorded in stripes as granularity, the stripes are logical units, and each stripe (stripe) is composed of a plurality of stripes (stripes). Each/group of hard disks stores one stripe of the stripe when data is destaged from the cache. In the process of migrating the cache to the hard disk by using the stripe as the granularity, the whole stripe is not meant to be valid data, and a part of the stripe may be invalid data, for example, only a part of the stripe is valid data, and the rest of the stripes have no valid data.
On one hand, the cache can acquire free storage space as early as possible by downloading the disk as early as possible; on the other hand, after the division and the downloading, the effective data and the invalid data in the division occupy the disk space, and if the division and the downloading do not contain as much effective data as possible, the utilization rate of the hard disk is low, so that from the viewpoint of improving the utilization rate of the disk, the division and the downloading are preferably not performed as early as possible but performed after the division and the downloading are completed. By applying the algorithm according to the embodiment of the invention, a better balance can be made between the two contradictory requirements.
Referring to fig. 3, a flow chart of an embodiment of a method for processing cached data according to the present invention is shown, which may be executed by a storage controller.
21, acquiring a deduplication ratio C of the data, a compression ratio D of the data and an average stripe filling rate I from the memory.
The deduplication ratio C, the compression ratio D and the average sharing and filling rate I are all statistics of historical data. When in initialization, the values are all 1; however, as the system runs, the 3 values will change dynamically due to the continuous new entries being re-deleted, compressed, and filled. The data may be stored in a memory after being counted, or may be stored in other media.
For each stripe in the cache, before being downloaded, the valid data in the stripe can be deduplicated. For each block in the stripe, if the same block already exists in the hard disk, the block is repeated with the existing block of the hard disk, and the block can be free from being downloaded. Therefore, the data volume of the lower disk can be reduced, and the storage space of the disk can be saved. The deduplication ratio C is a ratio of data amounts before and after the deduplication of the cache in the past period of statistics. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN. The deduplication ratio Cmin is 1 (no data chunks are deleted because of duplication or no deduplication is performed during deduplication). And before striping and closing, stripe data is deleted again, so that the disk space can be saved.
For each stripe in the cache, before being downloaded, the effective data in the stripe can be compressed in addition to the repeated data deletion. The compression may be performed after the deduplication operation. The compression ratio D is the ratio of the data amount before and after the stripe compression in the cache in the past period of time. Theoretically, the minimum value of the compression ratio D is 1 (compression operation was performed but no data was successfully compressed; or no compression operation was performed). And the stripe data is compressed before stripe division and downloading, so that the disk space can be saved. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN.
The average stripe filling rate I is an average value of the ratio of each stripe effective data in the cache to the total stripe length in a statistical past period. When the average stripe filling rate I is calculated, the statistical valid data may be valid data after the compression operation and the deduplication operation are performed. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN.
It should be noted that the embodiment of the present invention describes two concepts of memory and cache. The memory (memory) in the embodiment of the present invention is, for example, a RAM, and may temporarily store operation data in the CPU to provide a space for the computer to run a program. The memory may be part of the memory or may be physically separate from the memory. The cache (cache) may be used for temporarily storing the stripes in the embodiment of the present invention; the cache may also be a RAM, or may also be other media such as a solid state disk SSD.
And 22, acquiring the size S of a target stripe in the memory and the size W of effective data in the target stripe.
The target stripe size S may be described in terms of the length of the target stripe, for example, in bytes (Byte). It can also be described by the number of unit length, such as the number of cache pages (cache pages) occupied by the target stripe.
Similarly, the size W of valid data in the target stripe can be described by the length of the target stripe in bytes (Byte); it can also be described in terms of number per unit length. The description mode of the effective data size W in the target stripe is consistent with that of the target stripe size S.
Acquiring a eliminated data volume N according to a formula N ═ S ═ D ═ C-W)/I, and executing one of the following operations according to the value range of N:
23 according to the formula
Figure BDA0001197606330000081
And acquiring the eliminated data quantity N.
The compression ratio C and the deduplication ratio D are optional. If not, then there is no compression ratio C; if there is no deduplication, there is no deduplication ratio D.
When data in the cache is processed for the first time using the present embodiment, D-1, C-1, and I-1 may be defaulted because no slices have been processed historically.
It is clear that the unit sum of N S, W remains the same. When the unit of S is byte, the unit of N is also byte. The actual value of N may be an integer multiple of the minimum unit of the lower disc, and the difference from (S x D C-W)/I is less than one minimum unit. For example, the minimum unit of the lower disk is 10 bytes, (S × D × C-W)/I is calculated to be 54 bytes, and then the value of N is alternatively 50 bytes; another alternative is that the value of N is 60 bytes.
When S is described by the number of cache pages, N is also the number of cache pages. In this case, if the calculation result of (S × D × C-W)/I is a decimal, and the actual value of N may be a value obtained by rounding (S × D × C-W)/I, an alternative is to round down, for example, the calculation result of (S × D × C-W)/I is 10.3 cache pages, and the value of N may be 10 cache pages. In another embodiment, the data may be rounded up, and 10.3 cache pages may be rounded up to 11 cache pages.
And 24, executing corresponding operation according to the value of the eliminated data quantity N.
(1) At Tmin≤N≤TmaxIn case of (2), writing data of size N into the target stripe, and then obsoleting the data of the target stripe to a persistent storage medium (SSD, hard disk, etc.). After the data with the size of N is written into the target stripe, the current data of the target stripe is the data with the size of N plus the original data of the target stripe.
Wherein T isminIs a preset first threshold value, TmaXIs a preset second threshold. T isminAnd Tmax. The unit of (a) is identical to N. T isminAnd TmaxIn different scenes, different settings can be provided according to different requirements, and therefore, the setting is not limited. E.g. TminCan be 10% of the length of the section C D, and TmaxMay be 80% of the length of the stripe C D.
(2) At N > TmaxIn the case of (2), writing data of size N to the target stripe. The difference between (1) and (2) is that the foothold operation may not be performed on the target stripe temporarily. New data is subsequently written to the stripe, causing the stripe to be evicted to the persistent storage media storage medium after the stripe is filled.
(3) At N < TminIn the case of (2), evicting the target striped data to a persistent storage medium. The difference between (1) and (2) is that the data of the data with the size of N is not written into the target stripe, and the target stripe is directly downloaded.
It should be noted that both (1) and (2) above relate to writing the cached data into the target stripe. Such a write operation may be logically understood as attributing data to the target stripe that does not belong to the target stripe. After the write operation is performed, the written data is still physically in the cache. However, at the logical level, the data of size N is written into the target stripe, and the data in the cache layer is written into the storage pool (storage pool) layer. The cache layer is used for providing reading and writing for the host and mainly interacting with the host; the storage pool is used for temporarily storing data to be written into the hard disk and mainly interacts with the hard disk, so that the storage pool is two different functional layers.
For ease of understanding, the above embodiments are described below with a specific example.
For example: obtaining from a memory: the current deduplication rate is 2, the current compression rate is 4, and the current statistical average segmentation filling rate I is 0.9. The size of the cache page is 4KB, the stripe size is 8MB, i.e. S ═ 2K (8MB/4KB), 15K cache pages have been written in the current stripe, assuming Tmin ═ 128, Tmax ═ 4K.
According to the formula (S × D × C-W)/I ≈ 2K × 2 × 4-15K)/0.9 ≈ 1137.8. Rounding down to N1137.
The situation (2) is met, wherein Tmin is more than N and less than Tmax. Therefore, according to the embodiment of the present invention, the data of 1137 cache pages are written into the current stripe, and then all the data in the current stripe is sent to the hard disk. And after the sending is finished, releasing the storage space occupied by the target stripe in the cache. In this example, the number of cache pages is used as S, and it is common practice to take the cache pages as granularity when data in the cache is sent to the hard disk. In other embodiments, there may be other schemes, such as dividing the stripe size by a multiple of the cache page as S. In other embodiments, there may be no concept of cache pages, with the minimum granularity of the slice size or an integer multiple of the minimum granularity. Or directly taking the size of the stripe as S, and converting the value of N into the number of cache pages after calculating the value of N.
Referring to fig. 4, an embodiment of a cache data processing apparatus is further provided in the present invention. The cache data processing apparatus 3 is a hardware or program virtual apparatus and can execute the above method. The cache data processing apparatus 3 includes: a parameter acquisition module, a calculation module 32 connected with the parameter acquisition module, and a data processing module 33 connected with the calculation module 32.
The parameter obtaining module is configured to obtain a compression ratio D, a deduplication ratio C, and an average stripe filling rate I of the data, and is further configured to obtain a target stripe size S in the cache and an effective data size W in the target stripe.
The calculation module 32 is used for calculating according to the formulaAnd acquiring the eliminated data quantity N.
The data processing module 33 is configured to execute one of the following operations according to the value of N: at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold; at N > TmaxWriting data of size N into the target stripe; at N < TminIn case of (2), evicting the target stripe data to the storage medium.
The functions of the modules are described in detail below.
And the parameter acquisition module is used for acquiring the deduplication ratio C of the data, the compression ratio D of the data and the average stripe filling rate I from the memory.
The deduplication ratio C, the compression ratio D and the average sharing and filling rate I are all statistics of historical data. When in initialization, the values are all 1; however, as the system runs, the 3 values will change dynamically due to the continuous new entries being re-deleted, compressed, and filled. The data may be stored in a memory after being counted, or may be stored in other media.
For each stripe in the cache, before being downloaded, the valid data in the stripe can be deduplicated. For each block in the stripe, if the same block already exists in the hard disk, the block is repeated with the existing block of the hard disk, and the block can be free from being downloaded. Therefore, the data volume of the lower disk can be reduced, and the storage space of the disk can be saved. The deduplication ratio C is a ratio of data amounts before and after the deduplication of the cache in the past period of statistics. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN. The deduplication ratio Cmin is 1 (no data chunks are deleted because of duplication or no deduplication is performed during deduplication). And before striping and closing, stripe data is deleted again, so that the disk space can be saved.
For each stripe in the cache, before being downloaded, the effective data in the stripe can be compressed in addition to the repeated data deletion. The compression may be performed after the deduplication operation. The compression ratio D is the ratio of the data amount before and after the stripe compression in the cache in the past period of time. Theoretically, the minimum value of the compression ratio D is 1 (compression operation was performed but no data was successfully compressed; or no compression operation was performed). And the stripe data is compressed before stripe division and downloading, so that the disk space can be saved. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN.
The average stripe filling rate I is an average value of the ratio of each stripe effective data in the cache to the total stripe length in a statistical past period. When the average stripe filling rate I is calculated, the statistical valid data may be valid data after the compression operation and the deduplication operation are performed. The statistical striping range can be all stripes in the cache; or a partial stripe, for example, a stripe where the data after the footage belongs to a particular LUN.
It should be noted that the embodiment of the present invention describes two concepts of memory and cache. The memory (memory) in the embodiment of the present invention is, for example, a RAM, and may temporarily store operation data in the CPU to provide a space for the computer to run a program. The memory may be part of the memory or may be physically separate from the memory. The cache (cache) may be used for temporarily storing the stripes in the embodiment of the present invention; the cache may also be a RAM, or may also be other media such as a solid state disk SSD.
The parameter acquisition module is used for acquiring the size S of a target stripe in a memory and the size W of effective data in the target stripe.
The target stripe size S may be described in terms of the length of the target stripe, for example, in bytes (Byte). It can also be described by the number of unit length, such as the number of cache pages (cache pages) occupied by the target stripe.
Similarly, the size W of valid data in the target stripe can be described by the length of the target stripe in bytes (Byte); it can also be described in terms of number per unit length. The description mode of the effective data size W in the target stripe is consistent with that of the target stripe size S.
Acquiring a eliminated data volume N according to a formula N ═ S ═ D ═ C-W)/I, and executing one of the following operations according to the value range of N:
a calculation module 32 for calculating
Figure BDA0001197606330000121
And acquiring the eliminated data quantity N.
The compression ratio C and the deduplication ratio D are optional. If not, then there is no compression ratio C; if there is no deduplication, there is no deduplication ratio D.
When data in the cache is processed for the first time using the present embodiment, D-1, C-1, and I-1 may be defaulted because no slices have been processed historically.
It is clear that the unit sum of N S, W remains the same. When the unit of S is byte, the unit of N is also byte. The actual value of N may be an integer multiple of the minimum unit of the footwall, and
(S D C-W)/I is less than a minimum unit. For example, the minimum unit of the lower disk is 10 bytes, (S × D × C-W)/I is calculated to be 54 bytes, and then the value of N is alternatively 50 bytes; another alternative is that the value of N is 60 bytes.
When S is described by the number of cache pages, N is also the number of cache pages. In this case, if the calculation result of (S × D × C-W)/I is a decimal, and the actual value of N may be a value obtained by rounding (S × D × C-W)/I, an alternative is to round down, for example, the calculation result of (S × D × C-W)/I is 10.3 cache pages, and the value of N may be 10 cache pages. In another embodiment, the data may be rounded up, and 10.3 cache pages may be rounded up to 11 cache pages.
And 34, executing corresponding operation according to the value of the eliminated data quantity N.
(1) At Tmin≤N≤TmaXIn case of (2), writing data of size N into the target stripe, and then obsoleting the data of the target stripe to a persistent storage medium (SSD, hard disk, etc.). After the data with the size of N is written into the target stripe, the current data of the target stripe is the data with the size of N plus the original data of the target stripe.
Wherein T isminIs a preset first threshold value, TmaxIs a preset second threshold. T isminAnd Tmax. The unit of (a) is identical to N. T isminAnd TmaxIn different scenes, different settings can be provided according to different requirements, and therefore, the setting is not limited. E.g. TminCan be 10% of the length of the section C D, and TmaxMay be 80% of the length of the stripe C D.
(2) At N > TmaxIn the case of (2), writing data of size N to the target stripe. The difference between (1) and (2) is that the foothold operation may not be performed on the target stripe temporarily. New data is subsequently written to the stripe, causing the stripe to be evicted to the persistent storage media storage medium after the stripe is filled.
(3) At N < TminIn the case of (2), evicting the target striped data to a persistent storage medium. The difference between (1) and (2) is that the data of the data with the size of N is not written into the target stripe, and the target stripe is directly downloaded.
It should be noted that both (1) and (2) above relate to writing the buffered data into the stripe. Such a write operation logically attributes data that does not belong to the target stripe. After the write operation is performed, the written data is still physically in the cache. At the logical level, however, this is a storage pool (cache pool) into which the data of the cache tier is written. The cache layer is used for providing reading and writing for the host and mainly interacting with the host; the storage pool is used for temporarily storing data to be written into the hard disk and mainly interacts with the hard disk, so that the storage pool is two different functional layers. Steps 31, 32 and 33 are performed by the cache pool layer.
For ease of understanding, the above embodiments are described below with a specific example.
For example: obtaining from a memory: the current deduplication rate is 2, the current compression rate is 4, and the current statistical average segmentation filling rate I is 0.9. The size of the cache page is 4KB, the stripe size is 8MB, i.e. S ═ 2K (8MB/4KB), 15K cache pages have been written in the current stripe, assuming Tmin ═ 128, Tmax ═ 4K.
According to the formula (S × D × C-W)/I ≈ 2K × 2 × 4-15K)/0.9 ≈ 1137.8. Rounding down to N1137.
The situation (2) is met, wherein Tmin is more than N and less than Tmax. Therefore, according to the embodiment of the present invention, the data of 1137 cache pages are written into the current stripe, and then all the data in the current stripe is sent to the hard disk. And after the sending is finished, releasing the storage space occupied by the target stripe in the cache. In this example, the number of cache pages is used as S, and it is common practice to take the cache pages as granularity when data in the cache is sent to the hard disk. In other embodiments, there may be other schemes, such as dividing the stripe size by a multiple of the cache page as S. In other embodiments, there may be no concept of cache pages, with the minimum granularity of the slice size or an integer multiple of the minimum granularity. Or directly taking the size of the stripe as S, and converting the value of N into the number of cache pages after calculating the value of N.
The elements of each example and the algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and the changes or substitutions based on the present invention should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (11)

1. A method for processing cache data, the method comprising:
acquiring a compression ratio D, a deduplication ratio C and an average stripe filling rate I of data, and acquiring a target stripe size S in a cache and an effective data size W in the target stripe;
according to the formula
Figure FDA0002207518490000011
Obtaining eliminated data quantity N, and executing one of the following operations according to the value of N:
at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold;
at N > TmaxWriting data of size N into the target stripe;
at N < TminIn case of (2), evicting the target stripe data to the storage medium.
2. The method of cache data processing according to claim 1, wherein after writing data of size N to the target stripe, further comprising:
and eliminating the target stripe to the storage medium after the target stripe is fully written with subsequent data.
3. The cache data processing method according to claim 1, wherein:
the initial values of D, C and I are both 1.
4. The cache data processing method of claim 1, further comprising:
where S is described by the number of cache pages.
5. A cache data processing apparatus, comprising:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a compression ratio D, a deduplication ratio C and an average stripe filling rate I of data, and is also used for acquiring a target stripe size S in a cache and an effective data size W in the target stripe;
a calculation module for following the formula
Figure FDA0002207518490000012
Obtaining eliminated data quantity N;
a data processing module, configured to perform one of the following operations according to the value of N:
at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold;
at N > TmaxWriting data of size N into the target stripe;
at N < TminIn case of (2), evicting the target stripe data to the storage medium.
6. The cache data processing apparatus of claim 5, wherein the processing module is further to:
after data of size N is written into the target stripe, if the target stripe is full of subsequent data, the target stripe is evicted to the storage medium.
7. The cache data processing apparatus of claim 5, wherein:
the initial values of D, C and I are both 1.
8. The cache data processing apparatus of claim 5, wherein:
the S is described by the number of cache pages.
9. A memory controller, comprising a processor and a buffer for temporarily storing data, wherein the processor executes by running a program to:
the method comprises the steps of obtaining a compression ratio D, a deduplication ratio C and an average stripe filling rate I of data, and also obtaining a target stripe size S in a cache and an effective data size W in the target stripe;
according to the formulaObtaining eliminated data quantity N;
and executing one of the following operations according to the value of N: at Tmin≤N≤TmaxWriting data of size N into the target stripe, and eliminating the data of the target stripe to the storage medium, wherein when T isminIs a preset first threshold value, TmaxIs a preset second threshold; at N > TmaxWriting data of size N into the target stripe; at N < TminIn case of (2), evicting the target stripe data to the storage medium.
10. The storage controller of claim 9, the processor further to:
after data of size N is written into the target stripe, if the target stripe is full of subsequent data, the target stripe is evicted to the storage medium.
11. A storage medium characterized in that,
the storage medium stores a computer program which, when executed by a processor, is capable of implementing the method of any one of claims 1 to 4.
CN201611248587.3A 2016-12-29 2016-12-29 Cache data processing method and device and storage controller Active CN106648469B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201611248587.3A CN106648469B (en) 2016-12-29 2016-12-29 Cache data processing method and device and storage controller
PCT/CN2017/118147 WO2018121455A1 (en) 2016-12-29 2017-12-23 Cached-data processing method and device, and storage controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611248587.3A CN106648469B (en) 2016-12-29 2016-12-29 Cache data processing method and device and storage controller

Publications (2)

Publication Number Publication Date
CN106648469A CN106648469A (en) 2017-05-10
CN106648469B true CN106648469B (en) 2020-01-17

Family

ID=58836364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611248587.3A Active CN106648469B (en) 2016-12-29 2016-12-29 Cache data processing method and device and storage controller

Country Status (2)

Country Link
CN (1) CN106648469B (en)
WO (1) WO2018121455A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648469B (en) * 2016-12-29 2020-01-17 华为技术有限公司 Cache data processing method and device and storage controller
CN107153619A (en) * 2017-06-14 2017-09-12 湖南国科微电子股份有限公司 Solid state hard disc data cache method and device
CN109196458B (en) * 2017-11-03 2020-12-01 华为技术有限公司 Storage system available capacity calculation method and device
CN109086172B (en) * 2018-09-21 2022-12-06 郑州云海信息技术有限公司 Data processing method and related device
CN110222048B (en) * 2019-05-06 2023-06-23 平安科技(深圳)有限公司 Sequence generation method, device, computer equipment and storage medium
CN110765031B (en) * 2019-09-27 2022-08-12 Oppo广东移动通信有限公司 Data storage method and device, mobile terminal and storage medium
CN111124307B (en) * 2019-12-20 2022-06-07 北京浪潮数据技术有限公司 Data downloading and brushing method, device, equipment and readable storage medium
CN112799978B (en) * 2021-01-20 2023-03-21 网易(杭州)网络有限公司 Cache design management method, device, equipment and computer readable storage medium
CN113726341B (en) * 2021-08-25 2023-09-01 杭州海康威视数字技术股份有限公司 Data processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023810A (en) * 2009-09-10 2011-04-20 成都市华为赛门铁克科技有限公司 Method and device for writing data and redundant array of inexpensive disk
CN102223510A (en) * 2011-06-03 2011-10-19 杭州华三通信技术有限公司 Method and device for scheduling cache
CN103729149A (en) * 2013-12-31 2014-04-16 创新科存储技术有限公司 Data storage method
CN105389387A (en) * 2015-12-11 2016-03-09 上海爱数信息技术股份有限公司 Compression based deduplication performance and deduplication rate improving method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2366014B (en) * 2000-08-19 2004-10-13 Ibm Free space collection in information storage systems
US8060715B2 (en) * 2009-03-31 2011-11-15 Symantec Corporation Systems and methods for controlling initialization of a fingerprint cache for data deduplication
CN103902465B (en) * 2014-03-19 2017-02-08 华为技术有限公司 Method and system for recycling solid state disk junk and solid state disk controller
CN106648469B (en) * 2016-12-29 2020-01-17 华为技术有限公司 Cache data processing method and device and storage controller

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023810A (en) * 2009-09-10 2011-04-20 成都市华为赛门铁克科技有限公司 Method and device for writing data and redundant array of inexpensive disk
CN102223510A (en) * 2011-06-03 2011-10-19 杭州华三通信技术有限公司 Method and device for scheduling cache
CN103729149A (en) * 2013-12-31 2014-04-16 创新科存储技术有限公司 Data storage method
CN105389387A (en) * 2015-12-11 2016-03-09 上海爱数信息技术股份有限公司 Compression based deduplication performance and deduplication rate improving method and system

Also Published As

Publication number Publication date
WO2018121455A1 (en) 2018-07-05
CN106648469A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106648469B (en) Cache data processing method and device and storage controller
US20170185512A1 (en) Specializing i/o access patterns for flash storage
US9916248B2 (en) Storage device and method for controlling storage device with compressed and uncompressed volumes and storing compressed data in cache
US9448924B2 (en) Flash optimized, log-structured layer of a file system
JP5944587B2 (en) Computer system and control method
US10671309B1 (en) Predicting usage for automated storage tiering
US9965381B1 (en) Indentifying data for placement in a storage system
CN108604165B (en) Storage device
WO2012090239A1 (en) Storage system and management method of control information therein
US20150363134A1 (en) Storage apparatus and data management
CN111427855B (en) Method for deleting repeated data in storage system, storage system and controller
US10296229B2 (en) Storage apparatus
JP6711121B2 (en) Information processing apparatus, cache memory control method, and cache memory control program
US11093134B2 (en) Storage device, management method, and program in tiered storage system
US11630779B2 (en) Hybrid storage device with three-level memory mapping
US11086793B2 (en) Data reduction techniques for use with caching
US9699254B2 (en) Computer system, cache management method, and computer
JP7146054B2 (en) System controller and system garbage collection methods
US10891057B1 (en) Optimizing flash device write operations
US11579786B2 (en) Architecture utilizing a middle map between logical to physical address mapping to support metadata updates for dynamic block relocation
US9864688B1 (en) Discarding cached data before cache flush
JP2014010604A (en) Storage device, program, and method
US10853257B1 (en) Zero detection within sub-track compression domains
US11740792B2 (en) Techniques for data storage management
US11494303B1 (en) Data storage system with adaptive, memory-efficient cache flushing structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant