CN106021381A

CN106021381A - Data access/storage method and device for cloud storage service system

Info

Publication number: CN106021381A
Application number: CN201610311690.1A
Authority: CN
Inventors: 罗安宁
Original assignee: Beijing Sohu New Media Information Technology Co Ltd
Current assignee: Beijing Sohu New Media Information Technology Co Ltd
Priority date: 2016-05-11
Filing date: 2016-05-11
Publication date: 2016-10-12

Abstract

Embodiments of the invention disclose a data caching method and device. The method comprises the following steps that a metadata cache region and a data file cache region are established in a client, wherein the metadata cache region is used for storing metadata of each data file, and metadata is information used for describing the data file; the data file cache region is used for storing a data block of each data file; and each data file is split into a plurality of data blocks with the fixed size, an independent lock is configured for each data block, and the data is updated to the cloud or the data is read to the local cache from the cloud in units of the data blocks. According to the technical scheme provided by the invention, the metadata and the data files are stored in the client through a cache mechanism, so that the IO (Input Output) frequency can be reduced, and furthermore, the data files are stored in a way of the data blocks, so that the concurrency can be increased to support an efficient read-write mechanism.

Description

The data access of a kind of cloud storage service system/storage method and device

Technical field

The present invention relates to cloud storage field, particularly relate to the data access of a kind of cloud storage service system/deposit Method for storing and device.

Background technology

Cloud storage is a kind of Network storage technology, refers to by cluster application, network technology or distributed The functions such as file system, by various types of storage devices a large amount of in network by application software collection Collaborative work altogether, the common system that data storage and Operational Visit function are externally provided.Simple next Saying, cloud storage is exactly that storage resources is placed on cloud the scheme for people's access, and user can be any Time, any place, the device through any network-connectable is connected on cloud access data easily.

At present, client generally loads the shared storage disk of cloud storage by the way of loading disk To this locality, to realize remote access.At present, conventional mode has NFS (Network File System, NFS) and user's space file system (File system in User space, FUSE)；Its In, NFS is the one in the file system that FreeBSD supports, it allows between the computer in network By TCP/IP network shared resource.In the application of NFS, the client application of local NFS can To read and write the file being positioned on high in the clouds (cloud store-service end) nfs server pellucidly, just as accessing Local file is the same.FUSE be in Linux for some cyberspace of carry to local file system Module.

Present stage, the most network storage, such as: GlusterFS, HDFS, and Amazon AWS all has client file systems based on FUSE.Cloud will can be there is by these clients End network disk equipment be mounted to local directory.Wherein, s3fs-fuse is one is Amazon s3 The open source projects of cloud platform exploitation, achieves the file system of user's space based on FUSE.It provides To metadata and the management of data, but caching is lacked and further divides and manage by s3fs-fuse Reason, s3fs-fuse provides a single lock for a file, it means that for file in part During renewal, other parts cannot be modified, file data can only accessed sequentially and amendment, Lack concurrency.

Being similar to the other system of above-mentioned s3fs-fuse, single being locked into line pipe if used for file Reason, the problem that all can there is above-mentioned shortage concurrency, affect the effect that user accesses, revises high in the clouds data Rate, Consumer's Experience is not so good.

Summary of the invention

In order to solve in prior art, the present invention provides a kind of data cache method and device, client Hold by caching mechanism storage metadata and data file, so can reduce by IO number, by with number Store data file according to the mode of block, by increasing capacitance it is possible to increase concurrency, thus support to read and write mechanism efficiently.

Providing a kind of data cache method in the application first aspect, described method includes:

Metadata cache district and data file cache district, wherein, described metadata is set up in client Buffer area is for storing the metadata of each data file, and metadata refers to for describing data file Information；Described data file buffer area is for storing the data block of each data file；

Each data file is cut into the data block of multiple fixed size, for each data block configuration Independent lock, it is achieved in units of data block, data are updated to high in the clouds or data are read by high in the clouds To local cache.

Optionally, described in client, set up metadata cache district and data file cache district, including:

Metadata cache district is set up, so that metadata cache is in the content in the internal memory of client；

Data file buffer area is set up, so that data file is buffered in disk in the disk of client In.

Optionally, described method also includes:

Receiving the read request of operating system nucleus, described read request includes: filename to be read and treating The data segments read；

According to the content of described metadata cache district record, search with described in data file buffer area The data file that file name is corresponding；

Contrast the current version of data block to be read and the current version of file to be read；

If, the current version of data block less than the current version of file, then the more new data block from high in the clouds, After renewal, from data file caches, read data and return；

If the current version of data block is identical with the current version of file, then cache from data file Middle reading data also return.

Optionally, during more new data block, if, the entity tag ETag that network request returns Differ with the entity tag ETag of record in the metadata cache of data file, then update data file Current version and metadata cache, return described contrast operation after renewal；Described ETag refer to The mark of web resource association.

5, method according to claim 3, it is characterised in that the version of described data block is set For shaping atom variable.

Optionally, a data block several data blocks that pre-read is adjacent simultaneously are being read.

Optionally, described method also includes:

Receive operating system nucleus write request, described write request includes: file name to be written and Data to be written；

Obtain the lock of data block to be written, start this lock, write data in the buffer area of correspondence；

Write during, by written into data block upload to high in the clouds in advance；

At the end of all data blocks are uploaded, metadata corresponding in amendment metadata cache.

Optionally, described write during, by written into data block upload to high in the clouds in advance, Including:

Monitoring ablation process, it is judged that written into Data Position whether already more than the end of data block, And whether the state of uploading of this data block is designated to be uploaded, if it is, use asynchronous system in advance This data block is uploaded to high in the clouds, and is uploaded status indicator for upload.

Providing a kind of data buffer storage device in the application second aspect, described device includes:

Buffer area sets up unit, for setting up metadata cache district and data file cache in client District, wherein, described metadata cache district is for storing the metadata of each data file, and metadata is Refer to the information for describing data file；Described data file buffer area is used for storing each data file Data block；

Data block divides and buffer unit, for each data file is cut into multiple fixed size Data block, for the lock of each data block configuration independence, it is achieved data updated in units of data block Local cache is read by high in the clouds to high in the clouds or by data.

Optionally, described buffer area sets up unit, including:

Metadata cache district sets up subelement, for setting up metadata cache district in the internal memory of client, So that metadata cache is in the content；

Data file buffer area sets up subelement, delays for setting up data file in the disk of client Deposit district, so that data file is buffered in disk.

Optionally, described device also includes:

Read request response unit, for receiving the read request of operating system nucleus, described read request includes: Filename to be read and data segments to be read；

Search unit, for the content according to described metadata cache district record, cache in data file District searches the data file corresponding with described file name；

Contrast unit, for contrasting the current version of data block to be read and working as of file to be read Front version；

First reads unit, if for, the current version of data block is less than the current version of file, The then more new data block from high in the clouds, after renewal, reads data from data file caches and returns；

Second reads unit, if for, the current version of data block is identical with the current version of file, From data file caches, then read data and return.

Optionally, described first reads unit, including:

Update subelement, be used for during more new data block, if, the reality that network request returns Body label E Tag differs, then with the entity tag ETag of record in the metadata cache of data file Update current version and the metadata cache of data file, after renewal, return described contrast operation；Institute State the mark that ETag refers to associate with web resource.

Optionally, described device also includes:

Unit is set, is shaping atom variable for pre-setting the version of described data block.

Optionally, described device also includes:

Pre-read unit, reads section one data block of reading for reading unit or second first Several data blocks that pre-read is adjacent simultaneously.

Optionally, described device also includes:

Write request response unit, for receiving the write request of operating system nucleus, described write request includes: File name to be written and data to be written；

Data write unit, for obtaining the lock of data block to be written, starts this lock, data is write Enter in the buffer area of correspondence；

Data uploading unit, for write during, by written into data block upload in advance High in the clouds；

Metadata updates unit, at the end of all data blocks are uploaded, revises in metadata cache Corresponding metadata.

Optionally, described data uploading unit, including:

Upload subelement, be used for monitoring ablation process, it is judged that written into Data Position surpassed Cross the end of data block, and whether the state of uploading of this data block is designated to be uploaded, if it is, Use asynchronous system that this data block uploads to high in the clouds in advance, and uploaded status indicator for upload.

Relative to prior art, the advantage of the application technique scheme is:

Metadata cache district and data file cache district, wherein, described metadata is set up in client Buffer area is for storing the metadata of each data file, and metadata refers to for describing data file Information；Described data file buffer area is for storing the data block of each data file；By each data File is cut into the data block of multiple fixed size, for the lock of each data block configuration independence, it is achieved In units of data block, data are updated to high in the clouds or data are read local cache by high in the clouds.This This buffer structure that application proposes, is divided data file, is carried out with the form of data block Storage management so that data file can add concurrency, simultaneously by concurrent access and amendment For the respective version of data block and file record, can upgrade in time by the way of judging version data, It is thus possible to meet the requirement of data consistency.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below by right In embodiment or description of the prior art, the required accompanying drawing used is briefly described, it should be apparent that, Accompanying drawing in describing below is only some embodiments described in the application, skill common for this area From the point of view of art personnel, on the premise of not paying creative work, it is also possible to obtain it according to these accompanying drawings Its accompanying drawing.

The flow chart of a kind of data cache method that Fig. 1 provides for the application；

The schematic diagram of the data buffer storage structure that Fig. 2 provides for the application；

The flow chart of the method for reading data that Fig. 3 provides for the application；

The flow chart of the method for writing data that Fig. 4 provides for the application；

The structural representation of a kind of data buffer storage device that Fig. 5 provides for the application.

Detailed description of the invention

In order to make those skilled in the art be more fully understood that the present invention program, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is only a part of embodiment of the present invention rather than whole embodiments. Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, broadly falls into the scope of protection of the invention.

See the flow chart that Fig. 1, Fig. 1 are a kind of data cache methods that the application provides, such as Fig. 1 Shown in, the method may comprise steps of 101 and step 102.

Step 101: set up metadata cache district and data file cache district in client, wherein, Described metadata cache district is for storing the metadata of each data file, and metadata refers to for describing The information of data file；Described data file buffer area is for storing the data block of each data file.

Step 102: each data file is cut into the data block of multiple fixed size, for each The lock of data block configuration independence, it is achieved in units of data block, data are updated to high in the clouds or by data Local cache is read by high in the clouds.

The buffer structure proposed for clearer explanation the application, carries the application below in conjunction with Fig. 2 The buffer structure gone out is further explained explanation.

As in figure 2 it is shown, the buffer structure that the application provides includes that metadata cache district and data file are delayed Deposit these two parts of district.In view of in an operating system, metadata access is more frequent, and data ratio itself Metadata takies the features such as bigger disk space, in order to preferably utilize FTP client FTP resource, the application The preferred implementation provided, this implementation includes:

Metadata cache district is set up, so that metadata cache is in the content in the internal memory of client； And set up data file buffer area in the disk of client, so that data file is buffered in disk.

The data cached file of data file buffer area, this data file is that data file is having visit beyond the clouds A corresponding local cache file is dynamically set up when asking request, and at filec descriptor final plant closure Time delete this cache file.

In this application, data file is not to cache with overall document form, but is split Becoming the data block of multiple fixed size, this fixed size can configure when program starts.In read-write During request, thus in units of data block, data are updated to high in the clouds or by data by high in the clouds Read local cache.

Each data block is required to configure a Read-Write Locks, and Read-Write Locks is used for anti-during more new data from high in the clouds The only read requests of client.It addition, data block also needs to be configured with current version, working as of data block Front version is set to integer atom variable.Shaping atom variable is without keeping line while locking Cheng Anquan, can realize adding 1 operation in a cpu instruction, and efficiency is high.Simultaneously, Mei Gehuan Depositing file and also have an extra current version variable, current file version is to be determined by ETag, ETag Being the value returned by high in the clouds, it is the MD5 value of file.

The application is exactly based on the size of the current version of data block and the current version of cache file and sentences Data block in disconnected buffer area is the most consistent with high in the clouds.

The above-mentioned buffering scheme provided based on the application, the application also provides for the digital independent scheme of correspondence With data writing scheme, below by embodiment, both schemes are explained respectively.

See the flow chart that Fig. 3, Fig. 3 are the method for reading data that the application provides, this digital independent Method is to realize, as it is shown on figure 3, the method includes based on the buffering scheme shown in above-mentioned Fig. 1 Following steps 301-step 305；

Step 301: receiving the read request of operating system nucleus, described read request includes: to be read Filename and data segments to be read.

Step 302: according to the content of described metadata cache district record, in data file buffer area Search the data file corresponding with described file name.

Step 303: contrast the current version of data block to be read and the current version of file to be read This.

Step 304: if the current version of data block is less than the current version of file, then from high in the clouds More new data block, after renewal, reads data from data file caches and returns.

Step 305: if the current version of data block is identical with the current version of file, then from number According to file cache reading data and returning.

When realizing, client can receive by the kernel requests of the system forwards such as Fuse or NFS, should Request includes the data segments that the filename that needs access accesses with needs；Then, client contrast needs The current version of data block to be accessed and the magnitude relationship of the current version of file, when working as of data block When front version is less than the current version of file, the most more new data block, at the same time it can also be to adjacent Data block initiates asynchronous refresh request.When the current version of data block and the current version of file are identical, Then directly read from caching and return data.

During more new data block, if the ETag of network request return and current version metadata In caching, the Etag of record is not inconsistent, then update current version and the metadata cache of file, again return to Step 303 contrasts.After data renewal terminates, then read and return from data file caches Data.

Etag is the MD5 value of data file on the line that high in the clouds returns, therefore, when data file becomes During change, Etag is bound to change.Further, when Etag changes, it is simply that automatically update The FileVersion of Installed System Memory storage, FileVersion is integer atom variable.So, it becomes possible to by right The concordance of the on-line off-line data of data is ensured than Etag.

During the read-write of data file, whenever version change being detected on line, file current Version adds 1, and the current version causing data block may be less than the current version of file.The most all literary compositions Part is data cached will detect inefficacy, and need when next time accesses to update to latest edition.

After data block updates, the current version of data block will be registered as the current version of file. Meanwhile, by update every time request of data return ETag all carry out with the ETag in metadata cache right Ratio, when inconsistent, shows that the data in high in the clouds are the most updated, now, and will by the current version of file Again add 1 repetition aforesaid operations, until local cache data is consistent with high in the clouds.

During the read-write of multithreading read-write or multi-client, it is necessary to consider the concordance of data Problem.When high in the clouds data are modified, this amendment should reflect in the data of client in time. To achieve it, metadata cache needs a less time-out time, to ensure that metadata is delayed Deposit and can upgrade in time.When accessing file, by documents ETag and the record in metadata cache ETag the most consistent.To again update in the case of inconsistencies file data caching and file unit Data buffer storage.

See the flow chart that Fig. 4, Fig. 4 are the method for reading data that the application provides, this digital independent Method is to realize based on the buffering scheme shown in above-mentioned Fig. 1, and as shown in Figure 4, the method includes Following steps 401-step 404；

Step 401: receiving the write request of operating system nucleus, described write request includes: to be written File name and data to be written.

Step 402: obtain the lock of data block to be written, starts this lock, writes data into correspondence In buffer area.

Step 403: write during, by written into data block upload to high in the clouds in advance.

When realizing, step 403 may include that monitoring ablation process, it is judged that written into data bit Put the end whether alreading more than data block, and whether the state of uploading of this data block be designated to be uploaded, If it is, use asynchronous system that this data block uploads to high in the clouds in advance, and uploaded state mark Know for upload.

Step 404: at the end of all data blocks are uploaded, first number corresponding in amendment metadata cache According to.

Similar with the read method flow process shown in above-mentioned Fig. 3, the wiring method that the application provides is first Receiving the kernel requests forwarded by Fuse or NFS, this request includes the filename that needs write And data, then, find the cache file of correspondence according to metadata cache district recorded content, if, The most then create new cache file.Caching number is write data into after the lock getting data block According in block, and it is 0,0 to be used for identifying state to be uploaded by the status indication of uploading of the data revised. If the position being ultimately written has been over the end of certain data block, and the uploading of this data block Status indication is 0, then this data block of asynchronous upload, and state of being uploaded changes to 1, and 1 is used for Mark has uploaded state.After the end of file is uploaded, revise metadata cache data.

Due to, it is achieved efficiently read-write needs the problem considered mainly reduce by IO number and increase concurrent Property.Therefore, the technical scheme that the application provides, in order to reduce IO, needs the data accessed and unit's number According to increasing by one layer of caching.In an operating system, metadata access is more frequent, and data itself are than unit Data take bigger disk space.Herein file metadata is separated, by unit with file data caching Data are placed in internal memory, and data are divided and are placed on disk.For file data, then enter according to file size One step cutting, is divided into several equal-sized data block, this cache way, is conducive to reading at file During, background thread meeting several data blocks that pre-read is adjacent simultaneously, improve data reading performance using redundancy. In the process of file write, the data block having been written into can be uploaded by background thread in advance.Due to backstage Carry out during the read-write operation of thread and client, data when client reads next data block simultaneously Already in local, therefore its performance can be close to local IP access.

The method provided based on above example, the embodiment of the present invention additionally provides the device of correspondence, under Face combines accompanying drawing to describe its operation principle in detail.

Device embodiment

See the structure chart that Fig. 5, Fig. 5 are a kind of data buffer storage devices that the present invention provides, such as Fig. 5 Shown in, this device can include with lower unit:

Buffer area sets up unit 501, for setting up metadata cache district and data file in client Buffer area, wherein, described metadata cache district is used for storing the metadata of each data file, unit's number According to the information referred to for describing data file；Described data file buffer area is used for storing each data The data block of file；

Data block divides and buffer unit 502, multiple fixing big for each data file being cut into Little data block, for the lock of each data block configuration independence, it is achieved by data in units of data block Update to high in the clouds or data are read local cache by high in the clouds.

Optionally, described buffer area sets up unit 501, including:

Optionally, described device also includes:

Optionally, described first reads unit, including:

Optionally, described device also includes:

Optionally, described data uploading unit, including:

It should be noted that one of ordinary skill in the art will appreciate that and realize in said method embodiment All or part of flow process, can be by computer program and complete to instruct relevant hardware, institute The program stated can be stored in a computer read/write memory medium, and this program is upon execution, it may include Flow process such as above-mentioned each method embodiment.Wherein, described storage medium can be magnetic disc, CD, read-only Store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc..

Each embodiment in this specification all uses the mode gone forward one by one to describe, phase between each embodiment As homophase part see mutually, each embodiment stress with other embodiments Difference.For device embodiment, owing to it is substantially similar to embodiment of the method, So describing fairly simple, relevant part sees the part of embodiment of the method and illustrates.Above institute The device embodiment described is only schematically, the wherein said unit illustrated as separating component and Module can be or may not be physically separate.Furthermore it is also possible to according to the actual needs Select some or all of unit therein and module to realize the purpose of the present embodiment scheme.This area Those of ordinary skill, in the case of not paying creative work, is i.e. appreciated that and implements.

The above is only the detailed description of the invention of the present invention, it is noted that common for the art For technical staff, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, These improvements and modifications also should be regarded as protection scope of the present invention.

Claims

1. a data cache method, it is characterised in that described method includes:

Method the most according to claim 1, it is characterised in that described foundation unit in client Data buffer area and data file cache district, including:

Method the most according to claim 1, it is characterised in that described method also includes:

Method the most according to claim 3, it is characterised in that during more new data block, If, the entity tag ETag that network request returns and the reality of record in the metadata cache of data file Body label E Tag differs, then update current version and the metadata cache of data file, after renewal Return described contrast operation；Described ETag refers to the mark associated with web resource.

Method the most according to claim 3, it is characterised in that the version of described data block is set For shaping atom variable.

Method the most according to claim 3, it is characterised in that

Reading a data block several data blocks that pre-read is adjacent simultaneously.

Method the most according to claim 7, it is characterised in that described during write, By written into data block upload to high in the clouds in advance, including:

9. a data buffer storage device, it is characterised in that described device includes:

Device the most according to claim 9, it is characterised in that described buffer area sets up unit, Including:

11. devices according to claim 9, it is characterised in that described device also includes:

12. devices according to claim 11, it is characterised in that described first reads unit, Including:

13. devices according to claim 11, it is characterised in that also include:

14. devices according to claim 11, it is characterised in that described device also includes:

15. devices according to claim 9, it is characterised in that described device also includes:

16. devices according to claim 15, it is characterised in that described data uploading unit, Including: