A kind of content-based caching method that is applied to the Context resolution storage
Technical field
The present invention relates to the caching method in a kind of computer memory system, be specifically related to a kind of content-based caching method that is applied to the Context resolution storage; Belong to the computer memory system field.
Background technology
Large-scale virtual machine mirrored storage has been brought the problem of data repeated storage in the present desktop virtual product, has increased the storage space pressure of shared storage system, and Context resolution memory technology (CAS) is used to solve the problem of data repeated storage.It stores repeating data through the similar content property that detects data to merge, and can avoid the repeatedly storage of repeating data, reaches the purpose that reduces the virtual machine image storage overhead.When reducing storage overhead, can not cause remarkable influence to the performance of virtual machine.And optimization method should be transparent concerning monitor of virtual machine and VM operating system.
Hash function is a kind of method of from any data, creating little numeral " fingerprint ".This function is upset mixing with data, creates a fingerprint that is called cryptographic hash again.In CAS, through the file content piecemeal being calculated corresponding cryptographic hash storage.
The CAS technology carries out calculating the cryptographic hash sequence behind the piecemeal to file, judges repeating data through the cryptographic hash sequence.The data block identical for content only keeps portion, and divides the number of times that the corresponding quilt of record is shared in the block file at each.For adopting CAS mode files stored, original file actual storage be the information such as cryptographic hash sequence and file data size of each piecemeal of this document.The corresponding data division of Hash sequence is kept at shares the storage system the inside.
Inconsistent because of Computer Processing speed and memory speed, in computer system, exist a kind of caching mechanism.The storage speed of buffer memory is higher than the speed of the storer of practical operation; Be buffered in the effect of playing a buffering between processor and the actual storage; When processor need ceaselessly be read and write identical memory content, then can these data be temporary in the buffer memory, to improve readwrite performance.
Existing caching method all with memory address as the buffer memory index; This method encounters problems in being applied to the CAS file system time; Because of in the CAS file system; What preserve in the file address is the cryptographic hash rather than the content of content, is that the buffer memory of index can not well improve file system performance with the cryptographic hash.The present invention just provides the performance that a kind of caching method in the CAS file system improves the CAS file system
Summary of the invention
The technical matters that the present invention will solve is that a kind of caching mechanism is provided in the CAS file system, reduces the actual file operation, to improve the performance of CAS.General use data address is as the caching mechanism of buffer memory index and be not suitable for CAS, and the present invention uses the cryptographic hash of cache contents as the buffer memory index, can improve the performance of CAS effectively.
In order to reach the realization said method, technical scheme of the present invention is such:
A kind of content-based caching method that is applied to the Context resolution storage specifically comprises following content:
In the CAS file system, embed one and make the cache module of cryptographic hash, replace original disk operating with the read-write operation of cache module as the buffer memory index with cache contents.When the CAS file system was initiated the disk read-write operation, whether cache module inspection corresponding data piece earlier was buffered, can directly access use from the inner buffer area of cache module as being buffered then; When checking data, initiate actual read-write operation by cache module again, to reduce actual read-write number of times not at buffer area.Buffer area is a memory block that is defined in the cache module, is made up of a plurality of buffer units, and each buffer unit can data block of buffer memory.
Wherein, for the operation of CAS file read, CAS need read the corresponding data block of cryptographic hash sequence from shared memory, at this moment in cache module, calls cache_read () earlier and reads caching.If successfully read the data block that needs, then directly returned to CAS from the buffer area of self.Otherwise, initiate the disk read operation and read shared memory, after the acquisition data block, data block is write buffer area through cache_write (), return the upper strata again.
Wherein, For CAS file write operation; When CAS initiated write operation, CAS need write the mapping of a cryptographic hash sequence and data block to shared memory, and this moment, cache module was an index with the cryptographic hash sequence equally; Call cache_write () data block is saved in oneself buffer area, initiate actual disk write operation then and write shared memory.
Cache_read () is the read operation of a buffer area, seeks the corresponding cache unit by the buffer memory index that imports at buffer area, if cache hit then copies the data block in the buffer unit, does not then return failure flags if do not hit.Cache_write () is the write operation of a buffer area, calculates index value by data block, and seeks buffer unit at buffer area and preserve.
A kind of content-based caching method that is applied to the Context resolution storage of the present invention, its advantage and effect are to provide a kind of caching mechanism that in the CAS file system, uses, and come to improve effectively the performance of CAS file system.
Description of drawings
Fig. 1 is the structural drawing that has caching mechanism CAS file system
Fig. 2 is the CAS read operation flow process of band buffer memory
Fig. 3 is the CAS write operation flow process of band buffer memory
Fig. 4 is the structure of a buffer unit
Fig. 5 is the buffer area structure
Fig. 6 is the operating process of cache_read ()
Fig. 7 is the operating process of cache_write ()
Embodiment
For making the object of the invention, technical scheme and advantage express clearlyer, the present invention is remake further detailed explanation below in conjunction with accompanying drawing and specific embodiment.
As shown in Figure 1, the present invention is a kind of to be applied to the content-based caching method of Context resolution storage, is in the CAS file system, to embed a cache module, replaces original disk operating with the read-write operation of cache module.When checking data, initiate the disk read-write operation by cache module again, to reduce the disk read-write number of times not at buffer area.
Wherein: as shown in Figure 2, when CAS initiates read operation.The cache_read () that cache module calls oneself reads caching.If successfully read the data that need, then directly returned to CAS from buffer area.Otherwise, initiate the disk read operation, after the acquisition data, write buffer area through cache_write (), return CAS again.
As shown in Figure 3, when CAS initiated write operation, cache module called cache_write () data recording is arrived buffer memory, initiates disk write operation then.
Fig. 4 has described the data structure of a buffer unit.Comprise the zone of storing a cryptographic hash sequence, represent the integer of buffer unit vital values and the zone of storage data block for one.
Fig. 5 has explained the syndeton between the buffer unit in the buffer area.A group has been represented one type of buffer area that the buffer memory index is identical.The buffer memory index is to be calculated by the cryptographic hash sequence.When buffer unit of addressing, calculate the buffer memory index by the cryptographic hash sequence earlier, find corresponding group, in group, confirm a buffer unit then through the method for traversal.
In cache_read () lining, need do and read caching.As shown in Figure 6, earlier calculate the buffer memory index by the cryptographic hash sequence, traversal and cryptographic hash sequence alignment in the corresponding group of index if find identical cryptographic hash sequence, are promptly thought cache hit, the content in return cache cell data district.And replacement buffer unit vital values.If all do not hit, think that then buffer memory does not hit, the buffer unit vital values in the group is reduced a unit.Whether on behalf of buffer memory, rreturn value hit.
In cache_write () lining, caching need keep a record.As shown in Figure 7, earlier calculate the buffer memory index by the cryptographic hash sequence, traversal and cryptographic hash sequence alignment in the corresponding group of index, if find identical cryptographic hash sequence, promptly think buffer memory by in, need not do other operations.If do not hit, then search the minimum buffer unit of vital values, with it replacement.Replacement operation comprises the copy data district, duplicates the operation of cryptographic hash sequence and replacement vital values.
Vital values is used for decision and at which buffer unit is replaced.When the firm record data of buffer unit, be reset (for maximal value), after, a unit subtracted when being traversed each time certainly.Be considered to no longer valid from reducing to 0 back buffer unit.Replacement operation is searched the minimum buffer unit of vital values, and it is replaced to new data.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only; Although the present invention is specified with reference to the foregoing description; Those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention; And replace any modification or the part that do not break away from the spirit and scope of the present invention, and it all should be encompassed in the middle of the claim scope of the present invention.