CN102508790A - Content-based cache method applied to content analysis storage - Google Patents

Content-based cache method applied to content analysis storage Download PDF

Info

Publication number
CN102508790A
CN102508790A CN2011103650277A CN201110365027A CN102508790A CN 102508790 A CN102508790 A CN 102508790A CN 2011103650277 A CN2011103650277 A CN 2011103650277A CN 201110365027 A CN201110365027 A CN 201110365027A CN 102508790 A CN102508790 A CN 102508790A
Authority
CN
China
Prior art keywords
cache
read
cas
data block
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103650277A
Other languages
Chinese (zh)
Other versions
CN102508790B (en
Inventor
龚韬
肖利民
赵国玉
李秀桥
阮利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201110365027.7A priority Critical patent/CN102508790B/en
Publication of CN102508790A publication Critical patent/CN102508790A/en
Application granted granted Critical
Publication of CN102508790B publication Critical patent/CN102508790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a content-based cache method applied to content analysis storage, which particularly includes the steps: embedding a cache module with cache content serving as a hash code and as a cache index into a CAS (content analysis storage) file system, and utilizing read-write operation of the cache module to replace original disk operation. When the CAS file system initiates disk read-write operation, the cache module firstly checks whether a corresponding data block is cached or not, and the data block can be directly called out for use from a cache region inside the cache module if the data block is cached; and when checked data are not in the cache region, the cache module initiates practical read-write operation so as to decrease practical read-write times. The cache region is a storage region defined in the cache module and consists of a plurality of cache units, and each cache unit can be used for caching one data block. The method can provide a caching mechanism used in the CAS file system so that the performance of the CAS file system is effectively improved.

Description

A kind of content-based caching method that is applied to the Context resolution storage
Technical field
The present invention relates to the caching method in a kind of computer memory system, be specifically related to a kind of content-based caching method that is applied to the Context resolution storage; Belong to the computer memory system field.
Background technology
Large-scale virtual machine mirrored storage has been brought the problem of data repeated storage in the present desktop virtual product, has increased the storage space pressure of shared storage system, and Context resolution memory technology (CAS) is used to solve the problem of data repeated storage.It stores repeating data through the similar content property that detects data to merge, and can avoid the repeatedly storage of repeating data, reaches the purpose that reduces the virtual machine image storage overhead.When reducing storage overhead, can not cause remarkable influence to the performance of virtual machine.And optimization method should be transparent concerning monitor of virtual machine and VM operating system.
Hash function is a kind of method of from any data, creating little numeral " fingerprint ".This function is upset mixing with data, creates a fingerprint that is called cryptographic hash again.In CAS, through the file content piecemeal being calculated corresponding cryptographic hash storage.
The CAS technology carries out calculating the cryptographic hash sequence behind the piecemeal to file, judges repeating data through the cryptographic hash sequence.The data block identical for content only keeps portion, and divides the number of times that the corresponding quilt of record is shared in the block file at each.For adopting CAS mode files stored, original file actual storage be the information such as cryptographic hash sequence and file data size of each piecemeal of this document.The corresponding data division of Hash sequence is kept at shares the storage system the inside.
Inconsistent because of Computer Processing speed and memory speed, in computer system, exist a kind of caching mechanism.The storage speed of buffer memory is higher than the speed of the storer of practical operation; Be buffered in the effect of playing a buffering between processor and the actual storage; When processor need ceaselessly be read and write identical memory content, then can these data be temporary in the buffer memory, to improve readwrite performance.
Existing caching method all with memory address as the buffer memory index; This method encounters problems in being applied to the CAS file system time; Because of in the CAS file system; What preserve in the file address is the cryptographic hash rather than the content of content, is that the buffer memory of index can not well improve file system performance with the cryptographic hash.The present invention just provides the performance that a kind of caching method in the CAS file system improves the CAS file system
Summary of the invention
The technical matters that the present invention will solve is that a kind of caching mechanism is provided in the CAS file system, reduces the actual file operation, to improve the performance of CAS.General use data address is as the caching mechanism of buffer memory index and be not suitable for CAS, and the present invention uses the cryptographic hash of cache contents as the buffer memory index, can improve the performance of CAS effectively.
In order to reach the realization said method, technical scheme of the present invention is such:
A kind of content-based caching method that is applied to the Context resolution storage specifically comprises following content:
In the CAS file system, embed one and make the cache module of cryptographic hash, replace original disk operating with the read-write operation of cache module as the buffer memory index with cache contents.When the CAS file system was initiated the disk read-write operation, whether cache module inspection corresponding data piece earlier was buffered, can directly access use from the inner buffer area of cache module as being buffered then; When checking data, initiate actual read-write operation by cache module again, to reduce actual read-write number of times not at buffer area.Buffer area is a memory block that is defined in the cache module, is made up of a plurality of buffer units, and each buffer unit can data block of buffer memory.
Wherein, for the operation of CAS file read, CAS need read the corresponding data block of cryptographic hash sequence from shared memory, at this moment in cache module, calls cache_read () earlier and reads caching.If successfully read the data block that needs, then directly returned to CAS from the buffer area of self.Otherwise, initiate the disk read operation and read shared memory, after the acquisition data block, data block is write buffer area through cache_write (), return the upper strata again.
Wherein, For CAS file write operation; When CAS initiated write operation, CAS need write the mapping of a cryptographic hash sequence and data block to shared memory, and this moment, cache module was an index with the cryptographic hash sequence equally; Call cache_write () data block is saved in oneself buffer area, initiate actual disk write operation then and write shared memory.
Cache_read () is the read operation of a buffer area, seeks the corresponding cache unit by the buffer memory index that imports at buffer area, if cache hit then copies the data block in the buffer unit, does not then return failure flags if do not hit.Cache_write () is the write operation of a buffer area, calculates index value by data block, and seeks buffer unit at buffer area and preserve.
A kind of content-based caching method that is applied to the Context resolution storage of the present invention, its advantage and effect are to provide a kind of caching mechanism that in the CAS file system, uses, and come to improve effectively the performance of CAS file system.
Description of drawings
Fig. 1 is the structural drawing that has caching mechanism CAS file system
Fig. 2 is the CAS read operation flow process of band buffer memory
Fig. 3 is the CAS write operation flow process of band buffer memory
Fig. 4 is the structure of a buffer unit
Fig. 5 is the buffer area structure
Fig. 6 is the operating process of cache_read ()
Fig. 7 is the operating process of cache_write ()
Embodiment
For making the object of the invention, technical scheme and advantage express clearlyer, the present invention is remake further detailed explanation below in conjunction with accompanying drawing and specific embodiment.
As shown in Figure 1, the present invention is a kind of to be applied to the content-based caching method of Context resolution storage, is in the CAS file system, to embed a cache module, replaces original disk operating with the read-write operation of cache module.When checking data, initiate the disk read-write operation by cache module again, to reduce the disk read-write number of times not at buffer area.
Wherein: as shown in Figure 2, when CAS initiates read operation.The cache_read () that cache module calls oneself reads caching.If successfully read the data that need, then directly returned to CAS from buffer area.Otherwise, initiate the disk read operation, after the acquisition data, write buffer area through cache_write (), return CAS again.
As shown in Figure 3, when CAS initiated write operation, cache module called cache_write () data recording is arrived buffer memory, initiates disk write operation then.
Fig. 4 has described the data structure of a buffer unit.Comprise the zone of storing a cryptographic hash sequence, represent the integer of buffer unit vital values and the zone of storage data block for one.
Fig. 5 has explained the syndeton between the buffer unit in the buffer area.A group has been represented one type of buffer area that the buffer memory index is identical.The buffer memory index is to be calculated by the cryptographic hash sequence.When buffer unit of addressing, calculate the buffer memory index by the cryptographic hash sequence earlier, find corresponding group, in group, confirm a buffer unit then through the method for traversal.
In cache_read () lining, need do and read caching.As shown in Figure 6, earlier calculate the buffer memory index by the cryptographic hash sequence, traversal and cryptographic hash sequence alignment in the corresponding group of index if find identical cryptographic hash sequence, are promptly thought cache hit, the content in return cache cell data district.And replacement buffer unit vital values.If all do not hit, think that then buffer memory does not hit, the buffer unit vital values in the group is reduced a unit.Whether on behalf of buffer memory, rreturn value hit.
In cache_write () lining, caching need keep a record.As shown in Figure 7, earlier calculate the buffer memory index by the cryptographic hash sequence, traversal and cryptographic hash sequence alignment in the corresponding group of index, if find identical cryptographic hash sequence, promptly think buffer memory by in, need not do other operations.If do not hit, then search the minimum buffer unit of vital values, with it replacement.Replacement operation comprises the copy data district, duplicates the operation of cryptographic hash sequence and replacement vital values.
Vital values is used for decision and at which buffer unit is replaced.When the firm record data of buffer unit, be reset (for maximal value), after, a unit subtracted when being traversed each time certainly.Be considered to no longer valid from reducing to 0 back buffer unit.Replacement operation is searched the minimum buffer unit of vital values, and it is replaced to new data.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only; Although the present invention is specified with reference to the foregoing description; Those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention; And replace any modification or the part that do not break away from the spirit and scope of the present invention, and it all should be encompassed in the middle of the claim scope of the present invention.

Claims (4)

1. one kind is applied to the content-based caching method that Context resolution is stored; Specifically comprise following content: in the CAS file system, embed one and make the cache module of cryptographic hash, replace original disk operating with the read-write operation of cache module as the buffer memory index with cache contents; When the CAS file system was initiated the disk read-write operation, whether cache module inspection corresponding data piece earlier was buffered, can directly access use from the inner buffer area of cache module as being buffered then; When checking data, initiate actual read-write operation by cache module again, to reduce actual read-write number of times not at buffer area; Described buffer area is a memory block that is defined in the cache module, is made up of a plurality of buffer units, and each buffer unit can data block of buffer memory.
2. a kind of content-based caching method that is applied to the Context resolution storage according to claim 1; It is characterized in that: operate for the CAS file read; CAS need read the corresponding data block of cryptographic hash sequence from shared memory; At this moment in cache module, call cache_read () earlier and read caching; If successfully read the data block that needs, then directly returned to CAS from the buffer area of self; Otherwise, initiate the disk read operation and read shared memory, after the acquisition data block, data block is write buffer area through cache_write (), return the upper strata again.
3. a kind of content-based caching method that is applied to the Context resolution storage according to claim 1; It is characterized in that: for CAS file write operation; When CAS initiated write operation, CAS need write the mapping of a cryptographic hash sequence and data block to shared memory, and this moment, cache module was an index with the cryptographic hash sequence equally; Call cache_write () data block is saved in oneself buffer area, initiate actual disk write operation then and write shared memory.
4. according to claim 2 or 3 described a kind of content-based caching methods that are applied to the Context resolution storage; It is characterized in that: described cache_read () is the read operation of a buffer area; Seek the corresponding cache unit by the buffer memory index that imports at buffer area; If cache hit then copies the data block in the buffer unit, then do not return failure flags if do not hit; Described cache_write () is the write operation of a buffer area, calculates index value by data block, and seeks buffer unit at buffer area and preserve.
CN201110365027.7A 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage Active CN102508790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110365027.7A CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110365027.7A CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Publications (2)

Publication Number Publication Date
CN102508790A true CN102508790A (en) 2012-06-20
CN102508790B CN102508790B (en) 2014-08-13

Family

ID=46220881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110365027.7A Active CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Country Status (1)

Country Link
CN (1) CN102508790B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034684A (en) * 2012-11-27 2013-04-10 北京航空航天大学 Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN103095686A (en) * 2012-12-19 2013-05-08 华为技术有限公司 Hot metadata access control method and server
CN108537719A (en) * 2018-03-26 2018-09-14 上海交通大学 A kind of system and method improving graphics processing unit performance
US11249915B2 (en) * 2020-01-09 2022-02-15 Vmware, Inc. Content based cache failover

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611898B1 (en) * 2000-12-22 2003-08-26 Convergys Customer Management Group, Inc. Object-oriented cache management system and method
CN101887398A (en) * 2010-06-25 2010-11-17 浪潮(北京)电子信息产业有限公司 Method and system for dynamically enhancing input/output (I/O) throughput of server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611898B1 (en) * 2000-12-22 2003-08-26 Convergys Customer Management Group, Inc. Object-oriented cache management system and method
CN101887398A (en) * 2010-06-25 2010-11-17 浪潮(北京)电子信息产业有限公司 Method and system for dynamically enhancing input/output (I/O) throughput of server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩德志等: "《高可用存储网络关键技术的研究》", 31 July 2009, 科学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034684A (en) * 2012-11-27 2013-04-10 北京航空航天大学 Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN103095686A (en) * 2012-12-19 2013-05-08 华为技术有限公司 Hot metadata access control method and server
CN103095686B (en) * 2012-12-19 2016-06-08 华为技术有限公司 Focus metadata access control method and service device
CN108537719A (en) * 2018-03-26 2018-09-14 上海交通大学 A kind of system and method improving graphics processing unit performance
US11249915B2 (en) * 2020-01-09 2022-02-15 Vmware, Inc. Content based cache failover

Also Published As

Publication number Publication date
CN102508790B (en) 2014-08-13

Similar Documents

Publication Publication Date Title
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
US10303596B2 (en) Read-write control method for memory, and corresponding memory and server
US20180356993A1 (en) Optimized data placement for individual file accesses on deduplication-enabled sequential storage systems
CN105574104B (en) A kind of LogStructure storage system and its method for writing data based on ObjectStore
US9921955B1 (en) Flash write amplification reduction
CN109697016B (en) Method and apparatus for improving storage performance of containers
CN102521147B (en) Management method by using rapid non-volatile medium as cache
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
CN103399823B (en) The storage means of business datum, equipment and system
CN111930316B (en) Cache read-write system and method for content distribution network
CN104267912A (en) NAS (Network Attached Storage) accelerating method and system
CN103150395B (en) Directory path analysis method of solid state drive (SSD)-based file system
US11132145B2 (en) Techniques for reducing write amplification on solid state storage devices (SSDs)
US20220129420A1 (en) Method for facilitating recovery from crash of solid-state storage device, method of data synchronization, computer system, and solid-state storage device
CN103049224A (en) Method, device and system for importing data into physical tape
CN106933494A (en) Mix the operating method and device of storage device
CN103942161A (en) Redundancy elimination system and method for read-only cache and redundancy elimination method for cache
Allu et al. {Can’t} We All Get Along? Redesigning Protection Storage for Modern Workloads
CN106775501A (en) Elimination of Data Redundancy method and system based on nonvolatile memory equipment
CN102508790A (en) Content-based cache method applied to content analysis storage
US8151053B2 (en) Hierarchical storage control apparatus, hierarchical storage control system, hierarchical storage control method, and program for controlling storage apparatus having hierarchical structure
US11010091B2 (en) Multi-tier storage
US10055304B2 (en) In-memory continuous data protection
US11586353B2 (en) Optimized access to high-speed storage device
KR102403063B1 (en) Mobile device and management method of mobile device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Xiao Limin

Inventor after: Gong Tao

Inventor after: Zhao Guoyu

Inventor after: Li Xiuqiao

Inventor after: Ruan Li

Inventor before: Gong Tao

Inventor before: Xiao Limin

Inventor before: Zhao Guoyu

Inventor before: Li Xiuqiao

Inventor before: Ruan Li

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: GONG TAO XIAO LIMIN ZHAO GUOYU LI XIUQIAO RUAN LI TO: XIAO LIMIN GONG TAO ZHAO GUOYU LI XIUQIAO RUAN LI

C14 Grant of patent or utility model
GR01 Patent grant