Summary of the invention
The object of the invention is to propose a kind of non-volatile cache implementation method, to solve the cache lines list item management that exists in flash cache that above-mentioned prior art exists and the buffer status table that causes of Data Consistency is huge and the technical matters of the readwrite performance difference of opertaing device.
For this reason, the present invention proposes a kind of non-volatile cache implementation method, first be flash memory storage pond by the flash memory storage resource virtualizing of physics, then on described storage pool, three kinds of logic storage units are created, large buffer unit, little buffer unit and write mirror image unit, described large buffer unit is for providing conventional buffer service, described little buffer unit for the data temporary storage service of the acceleration service and read operation that provide random writing operations, described in write mirror image unit for providing redundancy backup defencive function for the dirty data in large buffer unit and little buffer unit;
During data write, as this write operation has hit the cache lines of little buffer unit, then data are write little buffer unit, the cache lines of large buffer unit has been hit as miss little buffer unit, then data are write large buffer unit, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then data are write little buffer unit, otherwise data do not write flash memory storage resource and the rear end storage cluster that writes direct;
During digital independent, as the cache lines of little buffer unit has been hit in this read operation, then the data in little buffer unit are returned, as miss little buffer unit has still hit the cache lines of large buffer unit, then the data in large buffer unit are returned, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then read the data of the corresponding size of cache lines of large buffer unit from rear end storage cluster and be loaded into the cache lines of large buffer unit, again data are returned to front end data application unit, as large buffer unit and little buffer unit all miss and accelerate mark invalid but data temporary storage mark effectively, then read the cache lines data of corresponding little buffer unit from rear end storage cluster and be loaded into the cache lines of little buffer unit, again data are returned to front end data application unit, otherwise the data read from rear end storage cluster directly give front end data application unit without flash memory storage resource.
Preferably, method of the present invention can also have following technical characteristic:
Described large buffer unit, little buffer unit meets following formula (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularityLEssT.LTssT.L T=available_DRAM_Size/entry_size with the size writing mirror image unit, wherein, Big_Size is the size of large buffer unit, Little_Size is the size of little buffer unit, Mirror_size is the size writing mirror image unit, Little_granularity is the size of little buffer unit cache lines, Big_granularity is the size of large buffer unit cache lines, available_DRAM_Size is the size of the DRAM of available memory buffers state table, entry_size is each table item size of buffer memory.
Described mirror image unit of writing is write mirror image subelement by least one logic and is formed, and described large buffer unit, little buffer unit are made up of at least one logic large buffer memory subelement, the little buffer memory subelement of at least one logic respectively.
The flash memory storage resource of described physics comprises two or more physics pallet, described large buffer unit, little buffer unit and write mirror image unit all across described two or more physics pallet.
Data are at the described large buffer unit of write, little buffer unit and when writing mirror image unit, write physical location and the described write physical location writing mirror image unit of described large buffer unit are on different described physics pallets, and the write physical location of described little buffer unit is also on different described physics pallets from the described write physical location writing mirror image unit.
Described little buffer unit and the single cache lines writing mirror image unit are positioned at same physics pallet or across two or more physics pallet, and the single cache lines of described large buffer unit is positioned at same physics pallet or across two or more physics pallet.
Which physics pallet data write operation and data read operation fall by following principle: when certain physics pallet damages, only by the transition of operation that was originally mapped on this physics pallet to other physics pallet, and the read-write operation be originally just mapped on other physics pallets to maintain mapping relations constant.
The cache lines of described large buffer unit at least comprises dirty situation, clean state and disarmed state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, and described disarmed state represents in cache lines without valid data; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, only receiving redirect when cache lines removing is asked is clean state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving invalidation request, jump to disarmed state.
The cache lines of described little buffer unit at least comprises dirty situation, clean state, disarmed state and frozen state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, described disarmed state represents in cache lines do not have valid data, described frozen state represents that current cache row is in frozen state, can only be read, can not be written into; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, receiving redirect when cache lines removing is asked is disarmed state, and receiving redirect when moving request is frozen state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving read request, jump to disarmed state; When cache lines is in frozen state, only receive moved return time cache lines jump to disarmed state.
Also comprise and guard unit, this guards unit for the dirty data write in mirror image unit being scavenged into rear end storage cluster on backstage, with limit need to do redundancy backup in described flash memory storage resource dirty data in predetermined scope.
Described redundancy backup adopts and writes mirror-image fashion.
The flash memory storage resource of described physics is flash memory, phased memory.
The present invention also proposes a kind of non-volatile cache implement device, comprising: flash memory storage resource virtualizing unit, for being flash memory storage pond by the flash memory storage resource virtualizing of physics;
Logic storage unit creating unit, for creating three kinds of logic storage units on described storage pool, large buffer unit, little buffer unit and write mirror image unit, described large buffer unit is for providing conventional buffer service, described little buffer unit for the data temporary storage service of the acceleration service and read operation that provide random writing operations, described in write mirror image unit for providing redundancy backup defencive function for the dirty data in large buffer memory and little buffer memory;
Data write unit and data-reading unit;
When described data write unit carries out data write, as this write operation has hit the cache lines of little buffer unit, then data are write little buffer unit, the cache lines of large buffer unit has been hit as miss little buffer unit, then data are write large buffer unit, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then data are write little buffer unit, otherwise data do not write flash memory storage resource and the rear end storage cluster that writes direct;
When described data-reading unit carries out digital independent, as the cache lines of little buffer unit has been hit in this read operation, then the data in little buffer unit are returned, as miss little buffer unit has still hit the cache lines of large buffer unit, then the data in large buffer unit are returned, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then read the data of the corresponding size of cache lines of large buffer unit from rear end storage cluster and be loaded into the cache lines of large buffer unit, again data are returned to front end data application unit, as large buffer unit and little buffer unit all miss and accelerate mark invalid but data temporary storage mark effectively, then read the cache lines data of corresponding little buffer unit from rear end storage cluster and be loaded into the cache lines of little buffer unit, again data are returned to front end data application unit, otherwise the data read from rear end storage cluster directly give front end front end data application unit without flash memory storage resource.
Preferably, device of the present invention can also have following technical characteristic:
Described large buffer unit, little buffer unit meets following formula (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularityLEssT.LTssT.L T=available_DRAM_Size/entry_size with the size writing mirror image unit, wherein, Big_Size is the size of large buffer unit, Little_Size is the size of little buffer unit, Mirror_size is the size writing mirror image unit, Little_granularity is the size of little buffer unit cache lines, Big_granularity is the size of large buffer unit cache lines, available_DRAM_Size is the size of the DRAM of available memory buffers state table, entry_size is each table item size of buffer memory.
Described mirror image unit of writing can be write mirror image subelement by multiple logic and forms.
The flash memory storage resource of described physics comprises two or more physics pallet, described large buffer unit, little buffer unit and write mirror image unit all can across described two or more physics pallet.
Described data write unit is when writing described large buffer unit, little buffer unit and writing mirror image unit by data, the write physical location of described large buffer unit is on different physics pallets from the described write physical location writing mirror image unit, and the write physical location of described little buffer unit is also on different physics pallets from the described write physical location writing mirror image unit.
Described little buffer unit and the single cache lines writing mirror image unit are positioned at same physics pallet or across two or more physics pallet, and the single cache lines of described large buffer unit is positioned at same physics pallet or across two or more physics pallet.
Which physics pallet the operation of described data write unit and data-reading unit falls by following principle: when certain physics pallet damages, only by the transition of operation that was originally mapped on this physics pallet to other physics pallet, and the read-write operation be originally just mapped on other physics pallets to maintain mapping relations constant.
The cache lines of described large buffer unit at least comprises dirty situation, clean state and disarmed state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, and described disarmed state represents in cache lines without valid data; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, only receiving redirect when cache lines removing is asked is clean state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving invalidation request, jump to disarmed state.
The cache lines of described little buffer unit at least comprises dirty situation, clean state, disarmed state and frozen state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, described disarmed state represents in cache lines do not have valid data, described frozen state represents that current cache row is in frozen state, can only be read, can not be written into; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, receiving redirect when cache lines removing is asked is disarmed state, and receiving redirect when moving request is frozen state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving read request, jump to disarmed state; When cache lines is in frozen state, only receive moved return time cache lines jump to disarmed state.
Also comprise and guard unit, this guards unit for the dirty data write in mirror image unit being scavenged into rear end storage cluster on backstage, with limit need to do redundancy backup in described flash memory storage resource dirty data in predetermined scope.
Described redundancy backup adopts and writes mirror-image fashion.
The beneficial effect that the present invention is compared with the prior art comprises: by being flash memory storage pond by the flash memory storage resource virtualizing of physics, and three kinds of logic storage units are created on described storage pool, and the data adopted write and read method, non-volatile cache implementation method of the present invention avoids and produces huge buffer status table problem, it also avoid the redundancy backup mode having a strong impact on write performance, vast capacity and very-high performance can be accomplished, thus significantly improve the readwrite performance of common control equipment, and can accomplish uninterruptedly to provide stores service.
Embodiment
Nonvolatile semiconductor memory member (i.e. flash memory storage resource) in cache implementing method disclosed by the invention includes but not limited to flash memory, phased memory etc.The storage system that rear end of the present invention connects includes but not limited to the 203 integrated distribution formula storage systems (cluster) provided in Fig. 2, is only that the present invention will be described for integrated distribution formula storage system framework below.
In integrated distribution formula storage system framework shown in Fig. 2, flash cache in common control equipment need to possess vast capacity, very-high performance (refer to and high IOPS and low delay are provided) feature this is because: the memory capacity of the distributed store cluster that common control equipment connects is PB rank, and corresponding buffer memory capacity is TB rank up to a hundred.But the flash cache of vast capacity is at two difficult problems, i.e. cache lines list item problem of management and Data Consistency.
When flash memory uses as buffer memory, whole storage resources is needed to be decomposed into a lot of cache lines (cache line) according to certain granularity, for each cache lines, all need the relevant information recording this cache lines, the data deposited as cache lines from where, the state etc. that cache lines is current, when the capacity of flash cache reaches TB byte up to a hundred, such as during 200T Byte, if divide cache lines according to the granularity of 4K Byte, then total 200TB/4KB=50 × 10
9individual cache lines, supposes that each cache lines needs 16Byte to record its state, so needs the table of 800GByte to carry out the state of the whole flash cache of identification record altogether, this be one huge and be the table that can not bear.And the granularity of 4KByte is determined by virtual machine 201, namely as the block memory device of virtual machine 201, the block access unit storing data is exactly 4KByte's.This will produce huge buffer status table and cache lines list item problem of management.
When flash memory uses as buffer memory, also need to ensure the consistance of data in buffer memory and rear end distributed store cluster 203 data, when the data in the data in buffer memory and distributed store cluster 203 are inconsistent, need to backup protection to the data in buffer memory.Being used maximum protected modes is at present exactly RAID5/6, but RAID5/6 is cost with huge write performance sacrifice.Another mode is exactly only use as reading buffer memory; any write operation all writes direct rear end distributed store cluster 203; and the related data in flash cache is set to disarmed state; thus data in guarantee buffer memory forever and rear end storage cluster data be consistent; avoid carrying out backup protection to the data in buffer memory; but such implementation can only be accelerated for partial read operation, and can not accelerate write operation.Here it is Data Consistency, and the adverse effect brought.
Contrast accompanying drawing below in conjunction with embodiment the present invention is described in further detail.It is emphasized that following explanation is only exemplary, instead of in order to limit the scope of the invention and apply.
With reference to the following drawings 1-9, will describe the embodiment of non-limiting and nonexcludability, wherein identical Reference numeral represents identical parts, unless stated otherwise.
Embodiment one:
A kind of non-volatile cache implementation method, first be flash memory storage pond by the flash memory storage resource virtualizing of physics, then on described storage pool, three kinds of logic storage units are created, large buffer unit 101, little buffer unit 102 and write mirror image unit 103, as shown in Figure 1.Described large buffer unit 101 is for providing conventional buffer service, described little buffer unit 102 for the data temporary storage service of the acceleration service and read operation that provide random writing operations, described in write mirror image unit 103 for providing redundancy backup defencive function for the dirty data in large buffer unit 101 and little buffer unit 102, during data write, as this write operation has hit the cache lines of little buffer unit 102, then data are write little buffer unit 102, the cache lines of large buffer unit 101 has been hit as miss little buffer unit 102, then data are write large buffer unit 101, as large buffer unit 101 and little buffer unit 102 all miss and accelerate mark effectively, then data are write little buffer unit 102, otherwise data do not write flash memory storage resource and the rear end storage cluster 203 that writes direct, during digital independent, as the cache lines of little buffer unit 102 has been hit in this read operation, then the data in little buffer unit 102 are returned, as miss little buffer unit 102 has still hit the cache lines of large buffer unit 101, then the data in large buffer unit 101 are returned, as large buffer unit 101 and little buffer unit 102 all miss and accelerate mark effectively, then read the data of the corresponding size of cache lines of large buffer unit 101 from rear end storage cluster and be loaded into the cache lines of large buffer unit 101, again data are returned to virtual machine 201, as large buffer unit 101 and little buffer unit 102 all miss and accelerate mark invalid but data temporary storage mark effectively, then read the cache lines data of corresponding little buffer unit 102 from rear end storage cluster and be loaded into the cache lines of little buffer unit 102, again data are returned to virtual machine 201, otherwise the data read from rear end storage cluster directly give front end virtual machine 201 without flash cache 100.Wherein, described virtual machine 201, as the one of front end data application unit, is only citing, and the front end data application unit in the present invention is not limited thereto.
In the present embodiment, the structural representation of the flash memory storage resource (or claiming flash cache 100) of described physics as shown in Figure 3, each pallet provides the flash memory storage resource of physics, and the inner reliability and stability adopting corresponding technology to ensure pallet inside.Physical flash storage resources is divided into large buffer unit 101 and little buffer unit 102, effectively can solves the problem that ultra-large capacity flash memory buffer status table is excessive.
As shown in Figure 4, for the cache line state table of large buffer unit is illustrated, namely the state of the cache lines of large buffer unit includes but not limited to the state listed in Fig. 4.Basic status after the cache lines of large buffer unit simplifies has three: dirty situation, and the data in the data namely in cache lines and back end storage system 203 are inconsistent; Clean state, the data consistent in the data namely in cache lines and back end storage system 203; Disarmed state, does not namely have valid data in cache lines.State transition process is: when cache lines is in disarmed state, if (such as carry out the write request of self virtualizing machine 201) when receiving the data write request of cache line size, cache lines jumps to dirty situation, if when receiving clean data loading request (such as from the write request of storage system 203), cache lines jumps to clean state; When buffer memory is in dirty situation, when only receiving cache lines removing request, state transition is clean state; When cache lines is in clean state, if receive data write request, cache lines jumps to dirty situation, if receive invalidation request, cache lines jumps to disarmed state.
As shown in Figure 5, for the cache line state table of little buffer unit is illustrated, namely the state of little buffer unit cache lines includes but not limited to the state listed in Fig. 5.Basic status after the cache lines of little buffer unit simplifies has four: dirty situation, and the data in the data namely in cache lines and back end storage system 203 are inconsistent; Clean state, the data consistent in the data namely in cache lines and back end storage system 203; Disarmed state, does not namely have valid data in cache lines; Frozen state, namely current cache row is in frozen state, can only be read, and can not be written into.State transition process is: when cache lines is in disarmed state, if (such as carry out the write request of self virtualizing machine 201) when receiving data write request, cache lines jumps to dirty situation, if when receiving clean data loading request (such as from the write request of storage system 203), cache lines jumps to clean state; When buffer memory is in dirty situation, if when receiving cache lines removing request, state transition is disarmed state, if receive mobile request, state transition is frozen state; When cache lines is in clean state, if receive data write request, cache lines jumps to dirty situation, if receive read request, cache lines jumps to disarmed state; When cache lines is in frozen state, only receive returning of having moved, cache lines jumps to disarmed state.
The state that greatly/little buffer unit is different and redirect achieve accelerates read operation and write operation.In this example, large buffer unit is different with the service object be not both because of them of little buffer unit state and redirect, carry out using large buffer unit or little buffer unit when buffer memory accelerates to read and write access, depend on the status information of tactful information and large buffer unit and little buffer unit.Strategy information includes but not limited to the grade of service, hit probability prediction etc.Strategy information can directly come from common control equipment 202, also can come from virtual machine 201.Status information includes but not limited to whether hit.In this example, large buffer unit for providing conventional buffer service, and can adopt different aging policy according to the grade of service to different cache lines; Little buffer unit provides buffer memory to accelerate function for the write operation not hitting large buffer unit first, and provides temporary data storage function for the read operation of not hitting large buffer unit.
The cache lines of little buffer unit 102 is little, such as is 4KByte; The cache lines of large buffer unit 101 is large, such as is 4Mbytes; The cache lines of the cache lines and little buffer unit 102 of writing mirror image unit 103 can be consistent.The size of concrete cache lines can adjust according to actual conditions, such as determine the cache line size of little buffer unit 102 according to virtual machine 201 storage resource request situation, determine the cache line size of large buffer unit 101 according to the situation that realizes of rear end distributed store cluster 203.
And little buffer unit 102, large buffer unit 101 and write size and the mutual relationship of mirror image unit 103, can determine according to the DRAM resource in common control equipment 202.Such as, need the table of all record buffer memory states all to be put in the corresponding DRAM resource of common control equipment 202, so will meet (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularityLEssT.LTssT.L T=available_DRAM_Size/entry_size.Wherein, Little_Size is exactly the size of little buffer unit 102, Mirror_size is the size writing mirror image unit 103, Little_granularity is the size of the cache lines of little buffer unit 102, the cache line size of the medium and small buffer unit 102 of the present embodiment and the block size of virtual machine 201 data access are consistent, Big_Size is the size of large buffer unit 101, Big_granularity is the size of large buffer unit 101 cache lines, available_DRAM_Size is the size of the DRAM of available memory buffers state table, entry_size is each table item size of buffer memory.
Writing mirror image unit 103 for the dirty data in large buffer unit 101 and little buffer unit 102 provides redundancy backup defencive function.The data carrying out self virtualizing machine 201, while the large buffer unit 101 of write or little buffer unit 102, are also written into and write in mirror image unit 103.
One preferably way be also comprise one further and guard unit, it is responsible for, on backstage, the dirty data write in mirror image unit 103 is scavenged into rear end storage cluster 203.Because write mirror image unit 103 only back up dirty data in large buffer unit 101 and little buffer unit 102, and have and guard unit and constantly dirty data is scavenged in rear end storage cluster 203 by predetermined rule, so the dirty data in flash cache 100 is limited, do not need to do redundancy backup to all data in whole flash cache 100.Meanwhile, backup policy adopts the mode writing mirror image, reduces the requirement of redundancy backup to performance on the one hand, achieves the effect accelerated all write operations on the other hand.
As shown in Figure 8, for guarding the treatment scheme of unit, first the situation writing mirror image unit 103 is detected, when writing mirror image unit 103 for non-NULL, guard unit and take out a dirty data and relevant information (such as address information) from writing mirror image, according to relevant information inquiry flash cache state table, obtain flash cache state.If buffer status display hit little buffer unit 102 cache lines, do not hit the cache lines of large buffer unit 101, then direct by the data dump in the cache lines of little buffer unit 102 in rear end storage cluster 203.If buffer status shows the cache lines that the cache lines of both having hit little buffer unit 102 also hits large buffer unit 101, then first by the data-moving in the cache lines of little buffer unit 102 in the cache lines of large buffer unit 101, and then by the data dump in the cache lines of large buffer unit 101 in rear end storage cluster 203.If the cache lines of little buffer unit 102 is not hit in buffer status display, hit the cache lines of large buffer unit 101, and the cache lines of large buffer unit 101 contains dirty data, then by the data dump in the cache lines of large buffer unit 101 in rear end storage cluster 203.If the cache lines of little buffer unit 102 is not hit in buffer status display, hit the cache lines of large buffer unit 101, and there is no dirty data in the cache lines of large buffer unit 101, then do not need to do any operation to large/little buffer unit.What deserves to be explained is, be described hereinly only citing, this flow process can also be revised accordingly according to the change of status information.Meanwhile, write mirror image unit 103 and can write mirror image subelement by multiple logic and form, each small letter mirror image subelement has oneself demons.
The corresponding relation of cache logic unit and physical location (physics pallet) is illustrated as shown in Figure 9, each logical block, large buffer unit 101, little buffer unit 102, writing mirror image unit 103 can across all physics pallet, the benefit done like this to improve the concurrent degree of each physics pallet, improves performance.Write mirror image logical block can be split into and multiplely little write mirror image logical subunit, each pallet such as, there is a logic write mirror image subelement, being divided into the benefit that multiple little logic writes mirror image subelement is can guard unit by concurrent multiple mirror image of writing, and improves speed dirty data being scavenged into rear end storage cluster.
As shown in Figure 9, the new write data carrying out self virtualizing machine 201, can by following principle when writing large buffer unit 101, little buffer unit 102 and writing mirror image unit 103: the physical location writing large buffer unit or little buffer unit and the physical location writing mirror image unit 103 are not on same physics pallet.The physics pallet sequence number such as writing mirror image unit 103 can be that the physics pallet sequence number writing large buffer unit 101 or little buffer unit 102 adds such simple rule (being not limited to this).The benefit done like this to ensure that redundancy backup and former data are on different physics pallets, and when single physical pallet damages, flash cache 100 still has available data to provide.The size of the cache lines of the large buffer memory provided in Fig. 9 is 4MByte, but in actual use, can adjust according to actual conditions.
Described little buffer unit 102 and the single cache lines writing mirror image unit 103 all can be positioned at same physics pallet or across two or more physics pallet, the single cache lines of described large buffer unit both across multiple physics pallet, also can be positioned at same physics pallet.Be positioned at same physics pallet for the single cache lines of large buffer unit in this example to be described, when which is convenient to realize the damage of single physical pallet more, still can provide the technique effect of persistent service.
Based on large buffer unit, little buffer unit and under writing mirror image unit Fractionation regimen, when a physics pallet damages wherein, persistent service presentation mode in the following example shown in.
If pallet 1 damages in Fig. 9, can not service be provided, and on pallet 1 what write mirror back-up is the dirty data on pallet 0, date restoring and persistent service provide process as follows:
The first step: first pallet 0 and pallet 1 are masked as and free buffer row state can not be provided.
Second step: traversal removes dirty data, and thread is as follows:
Thread 1: the cache line state table of traversal pallet 0, is in just being lost efficacy of clean state, be in dirty situation just by data dump to rear end storage cluster, then to be lost efficacy.
Thread 2: the cache line state table of traversal pallet 1, is in just being lost efficacy of clean state, is in dirty situation and just waits until that state becomes clean state.
Thread 3: raising pallet 2 is write mirror image and guard the running priority level of unit for highest.
Thread 1,2 and 3 is concurrence performance.
3rd step: after waiting tray 0 and 1 traversal all terminates, then pallet 0 is set to the state that free buffer row can be provided.Because under news, the write mirror image unit of pallet 0 on pallet 2 does double copies.
The read-write operation carrying out self virtualizing machine is fallen the selectable algorithm of which physics pallet and is determined according to following principle: when certain physics pallet damages, only by the transition of operation that was originally mapped on this physics pallet to other physics pallet, and the read-write operation be originally just mapped on other physics pallets to maintain mapping relations constant.The algorithm met this requirement has a lot at present, such as CRUSH algorithm etc.
Solving outside basic technique problems of the present invention and cache lines list item problem of management and Data Consistency, inventor also finds, because the granularity carrying out the read-write operation of self virtualizing machine 201 is consistent with the cache line size of little buffer unit 102, but the size of the cache lines of large buffer unit 101 is comparatively large, therefore there will be the situation of hitting large/little buffer unit simultaneously.Solved by method described below:
As shown in Figure 2 and Figure 6, when the write operation carrying out self virtualizing machine 201 is sent to flash cache 100, if this write operation has hit the cache lines of little buffer unit 102, then data are write little buffer unit 102, if do not hit the cache lines of little buffer unit 102, but hit the cache lines of large buffer unit 101, then data are write large buffer unit 101, if large/little buffer unit does not all hit, then whether inquiry acceleration mark is effective, if effectively, then data are write little buffer unit 102, otherwise data do not write flash cache 100, directly write rear end storage cluster 203 through common control equipment 202.Such write operation flow process ensure that, when write operation hits the cache lines of little buffer unit 102, the data in so little buffer unit 102 are up-to-date forever.
As shown in Figure 2 and Figure 7, when the read operation carrying out self virtualizing machine 201 is sent to flash cache 100, if the cache lines of little buffer unit 102 has been hit in this read operation, then the data in little buffer unit 102 are returned, if do not hit the cache lines of little buffer unit 102, but hit the cache lines of large buffer unit 101, then the data in large buffer unit 101 are returned, if the buffer memory provisional capital of large/little buffer memory is not hit, then whether inquiry acceleration mark is effective, if effectively, the data of large cache line size are then read from rear end storage cluster 203, be loaded into the cache lines of large buffer unit 101, again data are returned to virtual machine 201 after allowing, if invalid, then whether the temporary mark of data query is effective, if effectively, the cache lines data of little buffer unit 102 are then read from rear end storage cluster 203, be loaded into the cache lines of little buffer unit 102, and then data are returned to virtual machine 201, otherwise from rear end storage cluster read data without flash cache 100, directly give front end virtual machine 201 through common control equipment 202.
The non-volatile cache implementation method of the present embodiment, can control the size of the state table of record buffer memory state within the specific limits, except can accelerating read operation, can also accelerate whole write operation.In addition, only back up partial data during backup, backup data quantity is limited and backup operation is little to performance impact.Moreover, also there is no HotSpare disk, can persistent service be provided.
Embodiment two:
The device of the present embodiment is corresponding consistent with the non-volatile cache implementation method in previous embodiment.
A kind of non-volatile cache implement device, comprises flash memory storage resource virtualizing unit, logic storage unit creating unit, data write unit and data-reading unit.
It is flash memory storage pond that described flash memory storage resource virtualizing unit is used for the flash memory storage resource virtualizing of physics.
Described logic storage unit creating unit for creating three kinds of logic storage units on described storage pool; large buffer unit, little buffer unit and write mirror image unit; described large buffer unit is for providing conventional buffer service; described little buffer unit for the data temporary storage service of the acceleration service and read operation that provide random writing operations, described in write mirror image unit for providing redundancy backup defencive function for the dirty data in large buffer memory and little buffer memory.
The flash memory storage resource of described physics preferably can comprise two or more physics pallet, described large buffer unit, little buffer unit and write mirror image unit all across described two or more physics pallet.And preferably: described little buffer unit and the single cache lines writing mirror image unit are positioned at same physics pallet, and the single cache lines of described large buffer unit is positioned at same physics pallet or across two or more physics pallet.
When described data write unit carries out data write, as this write operation has hit the cache lines of little buffer unit, then data are write little buffer unit, the cache lines of large buffer unit has been hit as miss little buffer unit, then data are write large buffer unit, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then data are write little buffer unit, otherwise data do not write flash memory storage resource and the rear end storage cluster that writes direct.
Described data write unit is when writing described large buffer unit, little buffer unit and writing mirror image unit by data, the write physical location of described large buffer unit is preferably on different physics pallets from the described write physical location writing mirror image unit, and the write physical location of described little buffer unit is preferably also on different physics pallets from the described write physical location writing mirror image unit.
When described data-reading unit carries out digital independent, as the cache lines of little buffer unit has been hit in this read operation, then the data in little buffer unit are returned, the cache lines of large buffer unit has been hit as miss little buffer unit, then the data in large buffer unit are returned, as large buffer unit and little buffer unit all miss and accelerate mark effectively, then read the data of the corresponding size of cache lines of large buffer unit from rear end storage cluster and be loaded into the cache lines of large buffer unit, again data are returned to virtual machine, as large buffer unit and little buffer unit all miss and accelerate mark invalid but data temporary storage mark effectively, then read the cache lines data of corresponding little buffer unit from rear end storage cluster and be loaded into the cache lines of little buffer unit, again data are returned to virtual machine, otherwise the data read from rear end storage cluster directly give front end virtual machine without flash memory storage resource.
Described large buffer unit, little buffer unit and the size writing mirror image unit divide can by various ways, preferably adopts and meets following formula
(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularityLEssT.LTssT.L T=available_DRAM_Size/entry_size, wherein,
Big_Size is the size of large buffer unit,
Little_Size is the size of little buffer unit,
Mirror_size is the size writing mirror image unit,
Little_granularity is the size of little buffer unit cache lines,
Big_granularity is the size of large buffer unit cache lines,
Available_DRAM_Size is the size of the DRAM of available memory buffers state table,
Entry_size is each table item size of buffer memory.
In addition, write mirror image unit described in write mirror image subelement by multiple logic and form.
Which physics pallet the operation of described data write unit and data-reading unit falls preferably by following principle: when certain physics pallet damages, only by the transition of operation that was originally mapped on this physics pallet to other physics pallet, and the read-write operation be originally just mapped on other physics pallets to maintain mapping relations constant.
The cache lines of described large buffer unit at least comprises dirty situation, clean state and disarmed state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, and described disarmed state represents in cache lines without valid data; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, only receiving redirect when cache lines removing is asked is clean state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving invalidation request, jump to disarmed state.
The cache lines of described little buffer unit at least comprises dirty situation, clean state, disarmed state and frozen state, described dirty situation represent the data in cache lines and the data in back end storage system inconsistent, described clean state represents the data in cache lines and the data consistent in back end storage system, described disarmed state represents in cache lines do not have valid data, described frozen state represents that current cache row is in frozen state, can only be read, can not be written into; When cache lines is in disarmed state, when receiving data write request, jump to dirty situation, when receiving clean data loading request, jump to clean state; When buffer memory is in dirty situation, receiving redirect when cache lines removing is asked is disarmed state, and receiving redirect when moving request is frozen state; When cache lines is in clean state, when receiving data write request, jump to dirty situation, when receiving read request, jump to disarmed state; When cache lines is in frozen state, only receive moved return time cache lines jump to disarmed state.
In the present embodiment, also preferably include and guard unit, this guards unit for the dirty data write in mirror image unit being scavenged into rear end storage cluster on backstage, with limit need to do redundancy backup in described flash memory storage resource dirty data in predetermined scope.Described redundancy backup preferably adopts and writes mirror-image fashion.
Those skilled in the art will recognize that, it is possible for making numerous accommodation to above description, so embodiment is only used to describe one or more particular implementation.
Although described and described and be counted as example embodiment of the present invention, it will be apparent to those skilled in the art that and can make various change and replacement to it, and spirit of the present invention can not have been departed from.In addition, many amendments can be made so that particular case is fitted to religious doctrine of the present invention, and central concept of the present invention described here can not be departed from.So the present invention is not limited to specific embodiment disclosed here, but the present invention also may comprise all embodiments and equivalent thereof that belong to the scope of the invention.