CN115686366A - Write data caching acceleration method based on RAID - Google Patents

Write data caching acceleration method based on RAID Download PDF

Info

Publication number
CN115686366A
CN115686366A CN202211305875.3A CN202211305875A CN115686366A CN 115686366 A CN115686366 A CN 115686366A CN 202211305875 A CN202211305875 A CN 202211305875A CN 115686366 A CN115686366 A CN 115686366A
Authority
CN
China
Prior art keywords
data
raid
cache
cache line
physical disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211305875.3A
Other languages
Chinese (zh)
Inventor
何全
周津
曾永红
付彦淇
仇旭东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Jinhang Computing Technology Research Institute
Original Assignee
Tianjin Jinhang Computing Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Jinhang Computing Technology Research Institute filed Critical Tianjin Jinhang Computing Technology Research Institute
Priority to CN202211305875.3A priority Critical patent/CN115686366A/en
Publication of CN115686366A publication Critical patent/CN115686366A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a write data caching acceleration method based on RAID (redundant array of independent disks), belonging to the field of data transmission. When receiving the write operation of the upper computer, the Cache module does not immediately write the new data into the physical disk, but copies the new data into the Cache Line space for temporary storage, reads back the old data in the physical disk and writes the old data into the Cache Line when the sector number ratio of the data to be updated reaches the threshold value of the write-back physical disk, generates a check value according to the merged data, and writes back the merged data and the check value together to form the RAID physical disk. The invention can improve the writing performance of the RAID and maximize the utilization of the bandwidth of the disk.

Description

Write data caching acceleration method based on RAID
Technical Field
The invention belongs to the field of data transmission, and particularly relates to a write data caching acceleration method based on RAID.
Background
With the advent of the big data age, more and more users participate in the internet, and data in the internet rapidly grows at an exponential rate. Redundant Array of Independent Disks (RAID), referred to as a disk array for short, combines multiple disks by using a virtualization storage technology to form one or more disk array groups, and aims to improve performance or data redundancy, or both. Briefly, RAID combines multiple disks into one logical disk, so the operating system will only treat it as a physical disk. The RAID utilizes the parallelism among a plurality of disks, can greatly improve the read-write performance of access data, and can also perform mirroring or verification among the plurality of disks to provide the redundancy or error correction capability of the data. RAID is widely used at enterprise servers due to its high performance and good fault tolerance. Therefore, the RAID technology has good application prospect when being applied to processing mass data and storing at a server.
However, for RAID, frequent small data volume writes constitute a harsh environment. RAID often adopts a manner of increasing a check value, and provides a strong data fault-tolerant mechanism. However, each time data is updated, the parity values need to be recalculated and written back to the RAID physical disks. The extra write operation overhead brought by updating the check value influences the overall write performance of the RAID disk. Meanwhile, by applying the RAID technology to the SSD, frequent write operations caused by updating the check value may also accelerate wear of the flash memory chip.
In discussing the performance of RAID, it is known that RAID1 can provide relatively good read-write performance, RAID5 adds a redundant backup function, the read performance is better, but the write performance is poorer, RAID6 has better data protection capability, but the write performance is poorer, the main reason for the poorer write performance of RAID5 and RAID6 is the write amplification problem caused by writing with small data volume, and the write amplification problem of RAID in an environment where writing with small data volume is frequently performed is described below with RAID5 as an example. Before this, the data organization hierarchy in a RAID system is first described. The smallest unit that a RAID system can operate on is a sector, which is typically 512B in size, and also 4KB in size. RAID takes a segment of consecutive sectors of the same length from each disk to form a segment of partition unit strip, and the corresponding strip on each disk member forms a complete strip, for example, the strip a in the following figure includes 4 strips: a1, A2, A3 and A4.
RAID5 does not back up data, but stores the data and its corresponding parity information on each disk constituting RAID5, and stores the parity information and the corresponding data on different disks, respectively. When one disk of RAID5 is damaged, the damaged data can be recovered by using the remaining data and the corresponding check bit information. Shown in the following figure is RAID5 composed of 4 disks (at least composed of 3 disks), where in this stripe of 3+1, 3 disks store data and 1 disk stores parity bits.
In the writing process of RAID5, a new check value write-back device needs to be calculated while related data is updated, and extra writing amplification is brought. As shown in FIG. 2, when RAID5 modifies B1, A2 and C3, the stripe parity values Ap, bp and Cp are also modified at the same time, so 6 write operations are required, which brings extra write amplification.
The present invention has been made in view of the above background. The invention mainly solves the problem of write amplification caused by small data volume write in RAID technology, and designs a hardware acceleration algorithm scheme for reducing RAID write amplification, thereby improving the RAID write performance and maximizing the utilization of the bandwidth of a disk.
Disclosure of Invention
Technical problem to be solved
The invention aims to solve the technical problem of how to provide a write data caching acceleration method based on RAID (redundant array of independent disks) so as to solve the problem of write amplification caused by small data volume write in RAID technology.
(II) technical scheme
In order to solve the technical problem, the invention provides a write data caching acceleration method based on RAID, which comprises the following steps:
after detecting a new upper computer write operation, the S1 CACHE _ CAL module obtains an accessed target address and sector number of the new upper computer, aligns and splits the write operation into a plurality of CacheLines according to the size of the CacheLines, and sets numbers of the split CacheLines as 1, 2 \8230, wherein the numbers of \8230Nand k =1;
s2, the CACHE _ CAL module calculates whether the CacheLine address with the number k is in the current DDR or not, and if yes, the CacheLine address with the number k is directly operated; if the cache line does not exist, an empty cache line needs to be searched and occupied in the DDR, and the cache line is operated;
s3, the CACHE _ CAL module copies the new data of the upper computer into CacheLine, disassembles the writing operation into a sector operation corresponding to the CacheLine, writes 1 in the New DataRAM at all the sector mark positions corresponding to the new data, and each written sector is represented by 1 bit;
s4, the CACHE _ CAL module calculates the sector proportion of the data to be updated, which is marked as 1 by CacheLine in the NewDataRAM, and if the sector proportion reaches a threshold value of a write-back physical disk, the step S5 is carried out; otherwise, judging the value of k, if k = N, jumping back to the step S1, and if k < N, making k = k +1 to perform the step S2;
s5, the SATA module reads data of CacheLine corresponding to the RAID physical disk;
s6, writing the data obtained in the step S5 into the CacheLine of the DDR by the MERGE module, inquiring the state of a corresponding sector in the NewDataRAM in the writing process, and discarding the writing operation of the sector if the bit is '1', or writing the sector if the bit is not '1';
s7, the CACHE _ CAL module calculates a check value of CacheLine and writes the check value into a check position of CacheLine;
s8, writing data in CacheLine into a RAID physical disk by the SATA module;
s9, the CACHE _ CAL module sets the occupied CacheLine to be in an empty state, and sets the corresponding sector mark in the NewDataRAM to be 0;
s10, determining the value of k, returning to step S1 if k = N, and performing step S2 if k = k +1 if k < N.
Further, the method is applied to a Cache module, wherein the Cache module comprises a CACHE _ CAL module, a SATA module, a MERGE module, a CacheLine and a NewDataRAM.
Further, the CACHE _ CAL module completes the analysis of the write operation of the upper computer, the allocation of CacheLine, the read back control of the RAID physical disk data, the merging and check value calculation of the CacheLine data, the write control of the RAID physical disk data, and the calculation of the key data.
Further, the SATA module is a SATA interface controller module, and completes data transmission from the CacheLine to the RAID physical disk and data transmission from the RAID physical disk to the SATA module.
Further, cacheLine is a segment of DDR space that contains multiple sectors in size.
Further, newDataRAM is a block of data storage space, storing the new and old states of sector data.
Further, the MERGE module is configured to write the data of the RAID physical disk obtained by the SATA module into CacheLine of the DDR according to the state of NewDataRAM.
Further, the minimum operation unit of the write operation is 1 sector.
Further, the ratio of the number of sectors of the data to be updated in the CacheLine reaches a threshold of a write-back physical disk, old data in the physical disk is read back and written into the CacheLine, the data to be updated cannot be overwritten, a check value is generated according to the merged data, and the merged data and the check value are written back to the RAID physical disk.
Further, in the process of reading the old data in the physical disk into CacheLine, skipping the sector of the data to be updated; in the process of reading the old data, all the old data with the capacity of the CacheLine in the physical disk can be read into the CacheLine at one time, and before the old data of the physical disk is written into the CacheLine, all the write operations of the sector where the data to be updated is located are discarded.
(III) advantageous effects
The invention provides a write data caching acceleration method based on RAID (redundant array of independent disks), which solves the problem of write amplification caused by small data volume write in the RAID technology and designs a hardware acceleration algorithm scheme for reducing RAID write amplification, thereby improving the RAID write performance and maximizing the utilization of the bandwidth of a disk.
Drawings
FIG. 1 is a prior art RAID5 (data stripe with scatter check) diagram;
FIG. 2 is an example of RAID5 small writes in the prior art;
FIG. 3 is a flow chart of a method of the present invention;
fig. 4 is a block diagram of the system of the present invention.
Detailed Description
In order to make the objects, contents and advantages of the present invention more apparent, the following detailed description of the present invention will be made in conjunction with the accompanying drawings and examples.
The Cache acceleration method of the RAID researched by the invention is realized by the Cache module, and because the read-write access speed (usually 10GB/s magnitude) of the DDR is far higher than the speed of the RAID physical disk (usually 100MB/s magnitude), the Cache module reorganizes the write operation of the RAID physical disk by the Cache management of the DDR, and improves the write operation speed by reducing the write operation times of the RAID physical disk. The Cache module manages the DDR by taking "Cache line" as a unit, the "Cache line" is a section of storage space of the DDR managed by the Cache module, the storage space is composed of a plurality of sectors, the storage content comprises a section of continuous data and check values corresponding to the data, and the function corresponds to a stripe (stripe) on the RAID physical disk.
When receiving a write operation of an upper computer (a device or a module for storing data by using the RAID), the Cache module does not immediately write new data to a physical disk, but copies the new data to a CacheLine space for temporary storage. In general, the entire CacheLine is not filled with the write operation of the upper computer at one time, and at this time, it is necessary to record in the CacheLine which sectors are written by the upper computer and have "inconsistency" with data in the physical disk. These inconsistent data are "data to be updated" and are ultimately written to physical disks.
In the operation process of the RAID5, write operation data of the upper computer is temporarily stored in the DDR, and in order to reduce the load of the DDR, a check value of the data is not updated every time new data is received, but is generated all at once when the sector number of "data to be updated" in the Cache Line accounts for a threshold value of the write-back physical disk.
After a period of time, the sector number ratio of the data to be updated in the CacheLine reaches the threshold of the write-back physical disk, the old data in the physical disk needs to be read back and written into the CacheLine, but the data to be updated cannot be overwritten, a check value is generated according to the merged data, and the merged data and the check value are written back to the RAID physical disk.
In the process of reading old data in a physical disk into CacheLine, sectors of "data to be updated" need to be skipped. Because the continuous and scattered conditions of the data to be updated are unknown, if only the old data outside the sector of the data to be updated in the physical disk is read back, multiple reading operations can be generated through the SATA interface, and in order to improve the bandwidth utilization rate of the SATA interface, the process of reading the old data can read all the old data with the capacity of the CacheLine in the physical disk into the CacheLine at one time. Therefore, the operation times and time for reading data by the SATA interface can be saved, and each physical disk can also efficiently transmit large-packet data. Before old data of a physical disk is written into CacheLine, all write operations of a sector where 'data to be updated' is located need to be discarded, so that the data merging effect of the new data and the old data is achieved.
FIG. 4 is a block diagram of the method of the present invention, and the Cache modules studied herein include a CACHE _ CAL module, a SATA module, a MERGE module, a CacheLine, and a NewDataRAM.
The CACHE _ CAL module completes the analysis of the write operation of the upper computer, the allocation of CacheLine, the read back control of RAID physical disk data, the combination and check value calculation of CacheLine data, the write control of RAID physical disk data and the calculation of key data;
the SATA module is a SATA interface controller module and is used for completing data transmission from the CacheLine to the RAID physical disk and data transmission from the RAID physical disk to the SATA module;
CacheLine is a segment of DDR space containing multiple sectors in size;
the NewDataRAM is a data storage space and is used for storing the new state and the old state of sector data;
the MERGE module is used for writing the data of the RAID physical disk obtained by the SATA module into the CacheLine of the DDR according to the state of the NewDataRAM.
As shown in fig. 3, the method of the present invention comprises the steps of:
after detecting a new upper computer write operation, the S1 CACHE _ CAL module obtains the accessed target address and the sector number of the upper computer, aligns and splits the write operation into a plurality of CacheLines according to the size of the CacheLines, and sets the serial numbers of the split CacheLines as 1, 2 \8230, 8230, N, and let k =1.
S2, the CACHE _ CAL module calculates whether the CacheLine address with the number k is in the current DDR or not, and if yes, the CacheLine address with the number k is directly operated; if the cache line does not exist, an empty cache line needs to be searched and occupied in the DDR, and the cache line is operated;
and S3, copying the new data of the upper computer into CacheLine by the CACHE _ CAL module, disassembling the writing operation into sector operation corresponding to CacheLine, writing all the sector mark positions corresponding to the new data into 1 (default to 0) in the NewDataRAM, and indicating each written sector by 1 bit.
S4, the CACHE _ CAL module calculates the sector proportion of the data to be updated, which is marked as 1 by CacheLine in the NewDataRAM, and if the sector proportion reaches a threshold value of a write-back physical disk, the step S5 is carried out; otherwise, judging the value of k, if k = N, jumping back to the step S1, and if k < N, making k = k +1 to perform the step S2;
s5, the SATA module reads data of CacheLine corresponding to the RAID physical disk;
s6, writing the data obtained in the step S5 into the CacheLine of the DDR by the MERGE module, inquiring the state of a corresponding sector in the NewDataRAM in the writing process, and discarding the writing operation of the sector if the bit is '1', or writing the sector if the bit is not '1'; the minimum operation unit of the write operation physical disk herein is 1 sector, one CacheLine contains a plurality of sector sizes, and the composition of CacheLine described in fig. 3 is not the composition of sectors.
S7, the CACHE _ CAL module calculates a check value of CacheLine and writes the check value into a check position of CacheLine;
s8, the data in the CacheLine are written into the RAID physical disk by the SATA module;
s9, the CACHE _ CAL module sets the occupied CacheLine to be in an empty state, and sets the corresponding sector mark in the NewDataRAM to be 0;
s10, determining the value of k, returning to step S1 if k = N, and performing step S2 if k = k +1 if k < N.
The invention solves the problem of write amplification caused by small data volume write in RAID technology, and designs a hardware acceleration algorithm scheme for reducing RAID write amplification, thereby improving RAID write performance and maximizing the utilization of the bandwidth of a disk.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A write data caching acceleration method based on RAID is characterized by comprising the following steps:
the method comprises the steps that S1, after a CACHE _ CAL module detects a new upper computer write operation, the target address and the sector number of the access of the upper computer are obtained, the write operation is aligned and split into a plurality of CACHE lines according to the size of the CACHE lines, and the numbers of the split CACHE lines are set to be 1, 2 \8230, 8230N and k =1;
s2, the CACHE _ CAL module calculates whether the CACHE Line address with the number k is in the current DDR, and if yes, the existing CACHE Line is directly operated; if the Line does not exist, an empty Cache Line needs to be searched and occupied in the DDR, and the Cache Line is operated;
s3, the CACHE _ CAL module copies the new data of the upper computer into a CACHE Line, the writing operation is disassembled into sector operation corresponding to the CACHE Line, all the sector mark positions corresponding to the new data are written with 1 in a New data RAM, and each written sector is represented by 1 bit;
s4, the CACHE _ CAL module calculates the sector proportion of the data to be updated, marked as 1 by the CACHE Line, in the NewData RAM, and if the sector proportion reaches a threshold value of a write-back physical disk, the step S5 is carried out; otherwise, judging the value of k, if k = N, jumping back to the step S1, and if k < N, making k = k +1 to perform the step S2;
s5, reading data of Cache Line corresponding to the RAID physical disk by the SATA module;
s6, writing the data obtained in the step S5 into a Cache Line of the DDR by the MERGE module, inquiring the state of a corresponding sector in a NewData RAM in the writing process, and discarding the writing operation of the sector if the bit is '1', otherwise, writing the sector;
s7, the CACHE _ CAL module calculates a check value of the CACHE Line and writes the check value into a check position of the CACHE Line;
s8, writing data in the Cache Line into a RAID physical disk by the SATA module;
s9, setting the occupied CACHE Line to be in an empty state by the CACHE _ CAL module, and setting a corresponding sector mark in the NewData RAM to be 0;
s10, determining the value of k, returning to step S1 if k = N, and performing step S2 if k = k +1 if k < N.
2. The RAID-based write data caching acceleration method of claim 1, wherein the method is applied to Cache modules, and the Cache modules include a Cache _ CAL module, a SATA module, a MERGE module, a Cache Line, and a NewData RAM.
3. The RAID-based write data caching acceleration method as recited in claim 2, wherein the CACHE _ CAL module completes the analysis of the write operation of the upper computer, the allocation of CACHE Line, the read back control of RAID physical disk data, the merging and check value calculation of CACHE Line data, the write control of RAID physical disk data and the calculation of key data.
4. The RAID-based write data caching acceleration method of claim 2, wherein the SATA module is a SATA interface controller module, and completes data transmission from the Cache Line to the RAID physical disk and data transmission from the RAID physical disk to the SATA module.
5. The RAID-based write data caching acceleration method of claim 2, wherein the Cache Line is a segment of DDR space containing a plurality of sector sizes.
6. The RAID-based write data caching method of claim 2, wherein the NewData RAM is a block of data storage space storing the old and new states of sector data.
7. The RAID-based write data caching acceleration method of claim 2, wherein the MERGE module is configured to write data of the RAID physical disk obtained by the SATA module into the Cache Line of the DDR according to a state of the NewData RAM.
8. The RAID-based write data caching acceleration method of claim 1, wherein a minimum unit of operation for a write operation is 1 sector.
9. The RAID-based write data Cache acceleration method of any one of claims 1 to 8, characterized in that the percentage of sectors of data to be updated in the Cache Line reaches a write-back physical disk threshold, old data in the physical disk is read back and written into the Cache Line, and the data to be updated cannot be overwritten, a check value is generated according to the merged data, and the merged data and the check value are written back to the RAID physical disk.
10. The RAID-based write data Cache acceleration method of claim 9, wherein in the process of reading old data in a physical disk into a Cache Line, a sector of data to be updated is skipped; in the process of reading the old data, all the old data with the capacity of the Cache Line in the physical disk can be read into the Cache Line at one time, and all the write operations of the sector where the data to be updated is located are discarded before the old data of the physical disk is written into the Cache Line.
CN202211305875.3A 2022-10-24 2022-10-24 Write data caching acceleration method based on RAID Pending CN115686366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211305875.3A CN115686366A (en) 2022-10-24 2022-10-24 Write data caching acceleration method based on RAID

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211305875.3A CN115686366A (en) 2022-10-24 2022-10-24 Write data caching acceleration method based on RAID

Publications (1)

Publication Number Publication Date
CN115686366A true CN115686366A (en) 2023-02-03

Family

ID=85099508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211305875.3A Pending CN115686366A (en) 2022-10-24 2022-10-24 Write data caching acceleration method based on RAID

Country Status (1)

Country Link
CN (1) CN115686366A (en)

Similar Documents

Publication Publication Date Title
US9251052B2 (en) Systems and methods for profiling a non-volatile cache having a logical-to-physical translation layer
US9081690B2 (en) Storage system and management method of control information therein
CN107787489B (en) File storage system including a hierarchy
US8788876B2 (en) Stripe-based memory operation
US10102117B2 (en) Systems and methods for cache and storage device coordination
US10127166B2 (en) Data storage controller with multiple pipelines
CN103049222B (en) A kind of RAID5 writes IO optimized treatment method
US8966170B2 (en) Elastic cache of redundant cache data
US20150212752A1 (en) Storage system redundant array of solid state disk array
US20150095696A1 (en) Second-level raid cache splicing
US6378038B1 (en) Method and system for caching data using raid level selection
US9251059B2 (en) Storage system employing MRAM and redundant array of solid state disk
KR100208801B1 (en) Storage device system for improving data input/output perfomance and data recovery information cache method
US9514052B2 (en) Write-through-and-back-cache
US20140059294A1 (en) Storage system and storage control method
CN110737395B (en) I/O management method, electronic device, and computer-readable storage medium
US11379326B2 (en) Data access method, apparatus and computer program product
CN115686366A (en) Write data caching acceleration method based on RAID
CN107608626B (en) Multi-level cache and cache method based on SSD RAID array
CN117234430B (en) Cache frame, data processing method, device, equipment and storage medium
CN117785026B (en) Cache method based on SSD RAID-5 system high-efficiency writing
CN111857540B (en) Data access method, apparatus and computer program product
CN101149946B (en) Method for treating stream media data
Liu et al. PDB: A reliability-driven data reconstruction strategy based on popular data backup for RAID4 SSD arrays
Ye et al. A Multi-Channel Redundant Check Storage Method for Solid-State Disks based on Cold and Hot Data Stripes.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination