Background technology
Along with the fast development of information technology, the social informatization degree is more and more higher, and the digital information that the individual has also is explosive trend growth, and under such background, memory device has become necessary tool in people's daily life.But various numerous and diverse memory devices have also brought a lot of problems simultaneously, for example how to guarantee the consistency of data between a plurality of memory devices of personal user, how to guarantee the safe and reliable of the data deposited in all memory devices, it is limited etc. how to solve the memory device space.Therefore, how to provide a kind of efficiently, be convenient to manage, the stores service of space dynamic growth become the hot issue of research.
The new breakthrough that the fast development of computer networking technology brings to memory technology, network attached storage (NAS), storage area network technology such as (SAN) is in the ascendant, brought change to a great extent for traditional memory technology, but these technology are very expensive on the one hand, are not suitable on the other hand being applied in the wide area network scope.The notion of the cloud storage (Cloud Storage) that is proposed by SNIA (SNIA) and Amazon companies such as (Amazon) has proposed revolutionary conception at present memory technology development trend to the network storage in future: the stores service of data can be accomplished the service just as present power and water, network is routed to every household, various expenses standards are provided, provide different services different user.Doing so on the one hand can be for the user provides the very stores service of high-quality, for example can provide in the dynamic growth service of memory space and the network arbitrfary point to insert; Accomplished on the other hand user transparent, all given special cloud storage service provider with all technical problems and go to solve, problems such as reliability that the user needn't concern of data and fail safe; Simultaneously can also make the user needn't spend a large amount of expense maintenance upgrade storage systems for the user provides the cheap stores service of high-quality more.
The cloud storage generally is divided into two major parts according to framework, is respectively cloud stores service and high in the clouds storage system.The cloud stores service is meant that large-scale company of several families is deployed in data, services in the whole internet environment as what the service provider provided, can use certain interface that the cloud stores service is carried out corresponding data access, the data that leave in the cloud stores service are called cloud storage data.The high in the clouds storage system is meant the storage system that deployment is installed in client, but generally comprise the kernel module of dynamic load, be used to catch file-system command, carry out alternately so that the network communication module and the Executive Module that is used for the order that captures is handled of cloud storage data to be provided with cloud stores service interface, Executive Module is undertaken alternately by network communication module and cloud stores service, and different high in the clouds storage system differences are embodied in the Executive Module.The file that is stored in the storage system of high in the clouds is the same with file on being stored in hard disk, also form by a lot of data blocks in logic, but different be that the data block in the storage system of high in the clouds also is a file, is referred to as the data block file.The high in the clouds storage system is responsible for the data block file is stored in the cloud stores service, obtains data block file in the cloud stores service, revises data and management data block file in the data block file according to the file-system command of catching.
The cloud storage service provider of main flow comprises the S3 of Amazon (Amazon) and the mesh of Microsoft at present.At different cloud storage service providers, some corresponding high in the clouds storage systems have also appearred.Representative high in the clouds storage system comprises Dropbox and SugarSync etc., and they all are to utilize in client to deposit a complete data backup, after each then the modification, calculates residual quantity data back cloud stores service; When high in the clouds storage system visit cloud storage data, needs at first are stored in client with the download portion of data integrity, and then operate at the data of depositing to client.There is following significant disadvantages in such strategy: one, the inefficiency of visit cloud storage data: during high in the clouds storage system visit cloud stores service, need all cloud storage data of indiscriminate download to client, can operate accordingly after all download is over, make the user need spend the downloading process of plenty of time waiting time, if and the user only needs to operate a small documents, but have to all data are all downloaded from the cloud stores service; Two, opaque to the user: the user can not accomplish to visit cloud storage data as visit local file system data, need open corresponding high in the clouds storage system at every turn, carries out just can having access to cloud storage data after the data sync operation; Three, too responsive to network state, if network failure appears in the high in the clouds storage system to cloud stores service request msg the time, then all cloud storage data are all unavailable;
Summary of the invention
Order of the present invention is for overcoming the weak point of prior art, the data cache method of a kind of high in the clouds storage system is proposed, this method is utilized the strategy of data cached block file and the advantage of program locality (locality), reach the purpose that improves response speed, can also be implemented in the cloud storage data in the visit local data cache district under the suspension situation simultaneously.
The data cache method of a kind of high in the clouds storage system that the present invention proposes, it is characterized in that, this method uses flash disk (or to use SIM card, storage mediums such as SSD dish) as the carrier of high in the clouds file system, and flash disk is divided into system area and data field, deposit operation system in the system area, computer starts from flash disk; This data field further is divided into local data cache district and metadata database, deposits the data-block cache file that obtains from the cloud stores service in the local data cache district, and metadata database is used for writing down the descriptor of cloud storage data;
But the high in the clouds storage system in this method comprises the kernel module of dynamic load, network communication module, and the Executive Module that can carry out cache management; This method may further comprise the steps:
1) adopts the carrier of flash disk, start computer, the operating system in the flash disk system area is loaded in the calculator memory from flash disk as the high in the clouds storage system; The high in the clouds storage system is moved along with the startup of operating system as background program, but the kernel module of the dynamic load of high in the clouds storage system is joined in the operating system nucleus, makes that the high in the clouds storage system is a local file system to user's the form of expression;
2) the data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of high in the clouds storage system, data block numbering in the file that this data-block cache file of each data-block cache file correspondence of obtaining is formed leaves in the ordered list in the internal memory;
3) network communication module is carried out initialization, set up message queue, carrying out alternately with the cloud stores service on the internet by message queue; Upper level applications sends file-system command by the POSIX file system interface to Virtual File System layer (VFS);
4) but the kernel module of the dynamic load of cloud storage system sees through the Virtual File System layer catches the operational order that upper level applications is created file, revised file, read file and deleted file, and these orders are redirected to the Executive Module that can carry out cache management;
5) Executive Module that can carry out cache management carries out the operational order that captures concrete establishment file operation, writes data to the existing file operation, reads file data operation and delete file operation, and the new data block that these operation backs form is cached in the local data cache district with the form of file; Data when writing data to existing file and reading existing file in the local data cache district surpass setting threshold, the Executive Module that triggering can be carried out cache management carries out the buffer memory replacement to the data-block cache file in the local data cache district, and, obtain the data-block cache file to the cloud stores service when needed data-block cache file during not in the local data cache district;
6) network communication module is deposited back the cloud stores service with the new data-block cache file that forms in the step 5), and when needed data-block cache file during not in the local data cache district, obtains from the cloud stores service.
Characteristics of the present invention and beneficial effect are:
1, utilizes method of the present invention, can utilize the local data cache district to accelerate the response speed of high in the clouds storage system effectively, bring better user experience;
2, utilize method of the present invention, can be implemented in the cloud that visit is left in the local data cache district under the suspension state and store data, and present existing high in the clouds storage system is too responsive to network state, can't work fully at the suspension state;
3, compare with existing cloud storage system, the inventive method can realize the operation of the random read-write of file, thereby can accomplish to support the operations such as video playback that drag at random, and just can carry out associative operation after not needing pending file to download to this locality;
4, the inventive method has been avoided the existing data of transmission the other side between the storage system and cloud stores service beyond the clouds, has reduced network overhead significantly;
The pressure that has brought when 5, the inventive method has been avoided a large amount of read-write operation to the cloud stores service, if the data-block cache file leaves in the local data cache district, then directly data block cache file is operated, avoided a large amount of pressure of when cloud stores service request msg, bringing for the cloud stores service;
6, compare with existing high in the clouds storage system, the inventive method is particularly useful for having the cloud storage environment that network between limited bandwidth, high in the clouds storage system and the cloud stores service between high in the clouds storage system local memory device finite capacity, high in the clouds storage system and the cloud stores service is prone to fault characteristic.
Embodiment
The data cache method of a kind of high in the clouds storage system that the present invention proposes reaches embodiment in conjunction with the accompanying drawings and is described in detail as follows:
The present invention uses the carrier of flash disk (or using SIM card, storage mediums such as SSD dish) as the high in the clouds file system, and flash disk is divided into system area and data field, deposit operation system in the system area, and computer starts from flash disk; This data field further is divided into local data cache district and metadata database, deposits the data-block cache file that obtains from the cloud stores service in the local data cache district, and metadata database is used for writing down the descriptor of cloud storage data;
But the high in the clouds storage system in this method comprises the kernel module of dynamic load, network communication module, and the Executive Module that can carry out cache management; This method may further comprise the steps:
1) adopts the carrier of flash disk, start computer, the operating system in the flash disk system area is loaded in the calculator memory from flash disk as the high in the clouds storage system; The high in the clouds storage system is moved along with the startup of operating system as background program, but the kernel module of the dynamic load of high in the clouds storage system is joined in the operating system nucleus, makes that the high in the clouds storage system is a local file system to user's the form of expression;
2) the data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of high in the clouds storage system, data block numbering in the file that this data-block cache file of each data-block cache file correspondence of obtaining is formed leaves in the ordered list in the internal memory;
3) network communication module is carried out initialization, set up message queue, carrying out alternately with the cloud stores service on the internet by message queue; Upper level applications sends file-system command by the POSIX file system interface to Virtual File System layer (VFS);
4) but the kernel module of the dynamic load of cloud storage system sees through the Virtual File System layer catches the operational order that upper level applications is created file, revised file, read file and deleted file, and these orders are redirected to the Executive Module that can carry out cache management;
5) Executive Module that can carry out cache management carries out the operational order that captures concrete establishment file operation, writes data to the existing file operation, reads file data operation and delete file operation, and the new data block that these operation backs form is cached in the local data cache district with the form of file; Data when writing data to existing file and reading existing file in the local data cache district surpass setting threshold, the Executive Module that triggering can be carried out cache management carries out the buffer memory replacement to the data-block cache file in the local data cache district, and, obtain the data-block cache file to the cloud stores service when needed data-block cache file during not in the local data cache district;
6) network communication module is deposited back the cloud stores service with the new data-block cache file that forms in the step 5), and when needed data-block cache file during not in the local data cache district, obtains from the cloud stores service.As shown in Figure 1.
Data-block cache file in the described flash disk local data cache district is used for forming can supply the file that uses with program; File (can supply the file that uses with program) information table, data block information table and document composition table are set in the metadata database.
Described file information table is as shown in table 1, in this table record all leave file metadata information in the cloud stores service in by the high in the clouds storage system, comprise the creation-time (Ctime), modification time (Mtime) of file identification, file size, file type, filename, file parent directory sign, access privilege and file and access time (Vtime) at last;
Table 1: file information table
Described data block information table is as shown in table 2, the size of data blocks stored sign, number of references and data block in the record cloud stores service in this table, and the data block size maximum of present embodiment can be set to 10MB;
Table 2: data block information table
The data block sign |
Number of references |
The data block size |
A |
2 |
10 |
B |
1 |
8 |
C |
3 |
10 |
... |
... |
... |
Described document composition table is as shown in table 3, and the data block information of record composing document comprises file identification in this table, data block sign and data block piece number;
Table 3: document composition table
File identification |
The data block sign |
The data block numbering |
1 |
C |
0 |
1 |
B |
1 |
... |
... |
... |
Existing data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of high in the clouds storage system, obtain each data-block cache file correspondence data block numbering hereof, leave in the ordered list in the internal memory, ordered list is used for the corresponding data block cache file of the given data block of quick search sign whether in the local data cache district, if in the local data cache district, then return corresponding data block cache file filename, if do not exist, then return not information (generally being made as 0);
Establishment file operation in the described step 5) is included in adds the metadata information that is created file in metadata database file information table and the document composition table, thereafter database file is backuped to the cloud stores service with the residual quantity transmission manner;
The existing file operation that writes data in the described step 5) specifically may further comprise the steps as shown in Figure 2:
(5-11) kernel module passes to the Executive Module that can carry out cache management with the parameter of the write command that captures, and command parameter comprises file identification, writes character array pointer and length to be written; Can carry out the Executive Module of cache management and judge at first whether the number of characters group length that writes comprises a plurality of data blocks, if, the data that write after blocking are guaranteed all in a data block then with data truncation to be written, remaining this process of datacycle writes, until having write; The Executive Module that then can carry out cache management is according to the filename inquiry file information table in the command parameter, obtain the file identification of this document, form table by the file identification and the document misregistration inquiry file in the command parameter that obtain, acquisition data block sign, by data block sign inquiry ordered list, whether the data-block cache file of judging this document correspondence is in the local data cache district;
(5-12) draw the data-block cache file not in the local data cache district if judge, then send to the cloud stores service according to the file identification that obtains in (5-11) and data block sign and obtain request of data, the cloud stores service finds the corresponding data block cache file and passes back according to file identification and data block sign and is saved in the local data cache district; If the data-block cache file in the local data cache district, is then skipped this step;
(5-13) data in the data-block cache file are read in the calculator memory, according to the write command parameter that obtains in (5-11), character array to be written is write in this region of memory, this region of memory is calculated cryptographic Hash, by cryptographic Hash data query block information table, if the data consistent in the data in this region of memory and certain data-block cache file, then the number of references with this data-block cache file adds 1, otherwise, data in the region of memory are written in the local data buffering area, deposit into a new data-block cache file, and be dirty (dirty) with this new data-block cache file identification, the notice kernel module successfully writes, and kernel module and then notice upper level applications successfully write;
(5-14) according to the size that writes data, revise file information table, data block information table and document composition table in the metadata database, thereafter database file is backuped to the cloud stores service with the residual quantity transmission manner;
(5-15) when local data cache district amount of capacity surpasses setting threshold (generally be made as total capacity size 2/3rds), begin that the local data cache district is carried out buffer memory and replace; At first will not be labeled as dirty data-block cache file and transfer back to the cloud stores service, and delete these data-block cache files; If local data cache district amount of capacity then stops the buffer memory replacement process less than setting threshold at this moment; If the size in local data cache district is still greater than setting threshold at this moment, then adopts LRU to replace algorithm and replace being labeled as dirty data-block cache file; Replace and specifically to comprise according to the last access time data block cache file is sorted, calling cloud stores service interface successively will be stored in the cloud stores service with current and be labeled as the data-block cache file deletion of the same name of dirty data-block cache file, thereafter will be labeled as dirty data-block cache file and be transmitted back to the cloud stores service, then this data-block cache file be deleted from the local data cache district.In case local data cache district amount of capacity then stops the buffer memory replacement process less than setting threshold;
Read the existing file data manipulation as shown in Figure 3 in the described step 5), specifically may further comprise the steps:
(5-21) kernel module passes to the Executive Module that can carry out cache management with the parameter of the reading order that captures, and command parameter comprises file identification, core buffer pointer and length to be read; The Executive Module that can carry out cache management is at first judged whether a plurality of data blocks of include file of data length to be read, if comprise a plurality of data blocks, then length to be read is blocked in a data block, and remaining length this process that circulates is read in, until running through; The Executive Module that then can carry out cache management is according to the filename inquiry file information table in the command parameter, obtain the file identification of this document, form table by the file identification and the document misregistration inquiry file in the command parameter that obtain, acquisition data block sign, by data block sign inquiry ordered list, whether the data-block cache file of judging this document correspondence is in the local data cache district;
(5-22) draw the data-block cache file not in the local data cache district if judge, then send to the cloud stores service according to the file identification that obtains and data block sign and obtain request of data, the cloud stores service finds the corresponding data block cache file and passes back according to file identification and data block sign and is saved in the local data cache district; If the data-block cache file is in the local data cache district, then this step is skipped;
(5-23) copy to the local data cache district when data-block cache file to be read from the cloud stores service, cache manager is read into internal memory with the data-block cache file, according to the reading order parameter that obtains in (5-21), the corresponding data of read block cache file, and the data that read are returned to kernel module, kernel module and then data are returned to upper layer application;
(5-24) the last access time item of the file identification correspondence that reads in the revised file information table backups to the cloud stores service with database file with the residual quantity transmission manner thereafter;
Deletion existing file order in the described step 5) specifically may further comprise the steps as shown in Figure 4:
(5-31) kernel module passes to the Executive Module that can carry out cache management with the parameter of the deleted file order that captures, and command parameter comprises file identification; Form table according to the file identification inquiry file, obtain the data block sign of composing document;
(5-32) according to the result queries data block information table that obtains, the corresponding number of references of the sign of the data block among the result is subtracted 1, when number of references is 0, this data block sign is saved in the delete list;
(5-33), inquire about ordered list successively, judge that the corresponding data-block cache file of this data block sign whether in the local data cache district, if in the local data cache district, then deletes the data-block cache file according to the data block in delete list sign;
(5-34) according to the sign of the data block in the delete list, call the deleted file interface of cloud stores service successively, the respective data blocks cache file in the cloud stores service is deleted;
(5-35) metadata information of deletion deleted file in metadata database file information table and document composition table backups to the cloud stores service with database file with the residual quantity transmission manner thereafter;
(5-36) identify according to the data block in the delete list, entry in the deleted data block information table successively, then the respective record item in deleted file information table and the document composition table is deleted successfully by return value notice kernel module, kernel module and then the success of notice upper layer application deleted file.