CN107111615A

CN107111615A - A kind of data cache method and device for distributed memory system

Info

Publication number: CN107111615A
Application number: CN201480078749.6A
Authority: CN
Inventors: 李挥; 郭涵
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2014-05-28
Filing date: 2014-05-28
Publication date: 2017-08-29
Also published as: WO2015180070A1

Abstract

The present invention relates to a kind of data cache method for distributed memory system, comprise the following steps：Name node uploads configuration file into its internal memory and parses the configuration file；The information on caching in configuration file is obtained, described information includes cache path, cache way, copy amount and the condition for cancelling caching；The cache information that the back end is related to is transferred to the back end by name node when being shaken hands with each back end；The back end take out that it is related to data cached is simultaneously stored in the internal memory of the back end according to its cache information received；Complete after data buffer storage, back end notifies the name node when shaking hands next time.The invention further relates to a kind of device for realizing the above method.Implement the data cache method and device for distributed memory system of the present invention, have the advantages that：Its simple in construction, internal data exchange is less.

Description

A kind of data cache method and device for distributed memory system

Technical field

The present invention relates to distributed memory systems, more specifically to a kind of data cache method and device for distributed memory system.

Background technique

Currently, the storage and processing mode of information is constantly being developed.The storage and calculating of mass data can not be completed on a single computer, and distributed storage and computing system are increasingly becoming mainstream.When using distributed memory system, a outstanding distributed file system is generally required to be managed to storage.Distributed file system (Distributed File System) refers to that the physical memory resources of file system management are not necessarily directly connected on the local node, but is connected by computer network with node.The design of distributed file system is based on Client/Server pattern.One typical network may include multiple servers for multi-user access.In addition, ad-hoc nature allows some systems to play the part of the dual role of client-server.Be born many outstanding distributed file systems in recent years, the HDFS etc. of the swift and Hadoop of GFS, openStack of such as Google, is all outstanding distributed file system implementation.For example, the distributed file system of Hadoop.HDFS is developed most beginning as the architecture of Apache Nutch search engine project.HDFS is a part of Apache Hadoop Core project.It and existing distributed file system have many common ground.But meanwhile the difference of it and other distributed file systems is also apparent.HDFS is the system of an Error Tolerance, is suitble to be deployed on cheap machine.HDFS can provide the data access of high-throughput, the application being very suitable on large-scale dataset.HDFS relaxes a part of P0SIX constraint, and Lai Shixian streaming reads the purpose of file system data.The distributed file system to reach its maturity as one, HDFS are one of building mass data storage platform optimal selections.For example, using H DFS as storage platform in video field of storage, there is remarkable advantage.

In general, in distributed memory system, the acceleration of system links is seen everywhere using caching.For broadest scope, the place that any data flow is passed through can add caching, therefore can add caching, using the caching of CDN caching, proxy server caches and distributed storage back end in client.Caching is added inside distributed file system, and can need to be divided into different types of caching design according to different.For position difference, caching can be added to meta data server, back end server；It is different according to physics material, the caching work of disk, flash memory, memory as different stage can be used Tool；It is different according to effect, it can be to metadata cache, retrieval record buffer memory, listed files caching, data content caching etc..

It has been proposed, for example, that a kind of HDFS system with distributed caching structure, is called HDCache.

HDCache is based on HDFS and is added to a set of caching system, in HDFS top-level design and realizes a set of cache hierarchy, and for real-time cloud, quickly access provides preferably service on a large scale.HDCache by a set of Distributed Cache Mechanism introduce whole system, with pine close it is mode integrated together, convenient for management, deployment, replacement and upgrading, without bringing any hidden danger to HDFS itself.The design of HDCache provides a dynamic link libraries according to traditional C/S framework, in client, and all application programs using HDCache need to integrate dynamic link libraries in the client of oneself；A finger daemon can be all run in the server of server end, each deployment HDCache System.In HDCache, defect are as follows: need to be internally integrated caching code library in client integrated cache of memory frame dynamic link libraries, so that client is huger, be also easier to software coordinates failure occur, will cause influence to customer experience；In addition, the system adds caching proxy server in HDFS outer layer, to HDFS itself, there is no improvement, but construct buffer service thereon.This not can solve the low bottleneck of HDFS disk reading speed itself, only use caching as protection in outer layer.From realizing for angle, this set system is to provide a solution from using angle, and there is no the access performances for improving HDFS itself.

For another example, it is proposed that one kind is not in the case where needing management node, the method that distributed caching is effectively used in the VOD system that pine is closed, and a kind of new cache algorithm is proposed, it is called SCC (Scalable and Cooperative Caching).SCC algorithm is divided into two aspects: one is the algorithm for dispatching level, it devises the tupe that a set of value models carry out evaluating server；The other is the realization of caching technology.But this method still remains Railway Project: in distributed memory system, when number of clusters is very big, there is no available strategy is proposed in terms of buffer consistency by SCC；Secondly, each node needs to store all files metadata information of other nodes when quantity is very big, a large amount of internal exchange of data will cause very big influence to system itself.

Summary of the invention

The technical problem to be solved in the present invention is that, for the prior art above structure it is complicated, there may be the defect of many internal exchange of data, provide that a kind of structure is simple, a kind of lesser data cache method and device for distributed memory system of internal exchange of data.

The technical solution adopted by the present invention to solve the technical problems is: constructing a kind of for distributed storage system The data cache method of system, the distributed memory system include multiple back end for storing data and with the multiple back end by name node that is network connection and managing the multiple back end, described method includes following steps:

A) name node uploads configuration file into its memory and parses the configuration file；

B the information in configuration file about caching) is obtained, the information includes cache path, cache way, copy amount and the condition for cancelling caching；

C) cache information that the back end is related to is transferred to the back end when shaking hands with each back end by name node；

D it) back end take out that it is related to data cached and is stored in the memory of the back end according to received cache information；After completing data buffer storage, back end notifies the name node when shaking hands next time.

Further, further include following steps:

E) client sending reads request to the name node, and the data that the name node searches its requirement are that have caching, if any return cache position to client；Such as nothing, handled according to normal read request；

F) client sends data read request to back end, and back end searches local cache list after receiving the data read request, if any directly sending data to the client by its memory；It such as nothing, reads its disk acquirement Data Concurrent and gives the client, while notifying the name node caching abnormal.

Further, the configuration file is to define and be transferred to the name node by client；The configuration file is converted into a cache policy file on the name node, includes multiple objects for indicating cache file in the cache policy file；The name node obtains on each period by periodically scanning thread and replaces the cache policy file in its memory.

Further, the step B) further comprise:

B1 an object in the cache policy file) is obtained；

B2) judge that the path that the object is directed toward is one or a catalogue, if it is a file, execution step B3);If it is a catalogue, step B3 is executed after being resolved to multiple files);

B3 the metadata for indicating the cache file) is constructed, the metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it； B4) repeat the above steps Bl)-B3) until having handled all objects in the cache policy file.

Further, the step B) in further include following steps:

Data-block cache set of paths in the metadata that each object obtains in all buffer descriptors will be handled together, present scan is obtained and need caching data block list.

Further, the step C) further comprise:

C1 the present scan) needs to take out the data-block cache path of setting number in caching data block list；

C2) the back end being directed toward according to the cache path, sends the back end for the data-block cache path when shaking hands with the back end.

Further, the step D) further comprise:

D1 all data-block cache paths obtained when this is shaken hands) are obtained, and successively read the data block according to each data-block cache path, and it is cached in the memory of the back end；If do not found data block or memory headroom is limited to cache, then marks the data-block cache to fail and handle next data-block cache path；

D2 the buffered results of all data blocks of the name node) are returned when shaking hands next time.

The invention further relates to a kind of devices for realizing the above method, the distributed memory system include multiple back end for storing data and with the multiple back end by name node that is network connection and managing the multiple back end, described device includes:

Configuration file parsing module: for uploading configuration file into its memory in name node and parsing the configuration file；

Cache information obtains module: for obtaining the information in configuration file about caching, the information includes cache path, cache way, copy amount and the condition for cancelling caching；

Cache information sending module: for making name node that the cache information that the back end is related to are transferred to the back end when shaking hands with each back end；

Caching realizes module: being stored in the memory of the back end for make that the back end takes out that it is related to data cached and according to received cache information；After completing data buffer storage, back end notifies the name node when shaking hands next time.

Further, further includes: Read request module: for making client sending read request to the name node, the data that the name node searches its requirement are that have caching, if any return cache position to client；Such as nothing, handled according to normal read request；

Cache lookup module: for making client send data read request to back end, back end searches local cache list after receiving the data read request, if any directly sending data to the client by its memory；It such as nothing, reads its disk acquirement Data Concurrent and gives the client, while notifying the name node caching abnormal.

Further, the cache information obtains module further include:

Object acquisition unit: for obtaining an object in the cache policy file；

Path judging unit: for judging that the path that the object is directed toward is one or a catalogue, if it is a file, calling metadata forms unit；If it is a catalogue, metadata is called to form unit after being resolved to multiple files；

Metadata forms unit: for constructing the metadata for indicating the cache file, the metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it；

Object judging unit: all objects in the cache policy file have been handled for judging whether；Data block list forms unit: for that will handle the data-block cache set of paths in the metadata that each object obtains in all buffer descriptors together, obtaining present scan and needs caching data block list.

Implement the data cache method and device for distributed memory system of the invention, it has the advantages that since practical configuration file has determined the parameter for the file that needs cache, and parse configuration file in name node, obtain the metadata for indicating cache file；When name node and back end are shaken hands, it will indicate that the relevant parameter of data block is transferred to back end in above-mentioned metadata, data block that these parameters indicate is read and be buffered in by disk in the memory of the back end by back end.In this way, not only not making modification to the structure of HDFS itself, but also solves the problems, such as related data caching.Therefore simple, the internal data exchange of its structure is less.

Detailed description of the invention

Fig. 1 is the present invention for caching side in the data cache method and Installation practice of distributed memory system The flow chart of method；

Fig. 2 is the logical construction schematic diagram for handling configuration file part in the embodiment on name node；Fig. 3 is the state transition diagram of caching data block in the embodiment；

Fig. 4 is the apparatus structure schematic diagram in the embodiment.

Specific embodiment

Below in conjunction with attached drawing, embodiments of the present invention is further illustrated.

As shown in Fig. 1, in the data cache method of distributed memory system of the present invention and Installation practice, method includes the following steps:

Step S11 uploads configuration file to its memory and parses: in the present embodiment, distributed memory system includes multiple back end (Data Node for storing data, DN) and with the multiple back end pass through name node (Name Node that is network connection and managing the multiple back end, NN), client obtains the data stored on above-mentioned back end by above-mentioned name node.In the prior art, in the case where no caching, client proposes read request to name node, informs the file that the name node needs to read；Name node finds the storage location of this document, and the position is returned to the client for proposing read request；After the client obtains these information, read request is proposed to the back end for being stored with this document (or data block of composition this document) again, these back end for receiving request read file or data block, and return to the client, to complete the acquirement of data or file.In the present embodiment, when complete specified file caching after, read this document the step of it is generally still identical.Only in the case where file is buffered, what name node returned to client is not the storage location of file but the position of caching, and back end, which is also not, directly to be removed reading file but the file of caching is directly sent to client in the case where finding CACHE DIRECTORY.In this way, realizing the caching of file in the case where utmostly not increasing internal data transfer, the speed for obtaining file is improved, the experience of user is improved.Certainly, in the present embodiment, the caching of file is still realized first, and files of these cachings are that the configuration file write by client is specified.Particularly, the memory of all back end (DN) is exactly integrated into a memory pool, client is exposed to by way of configuring, client is allowed to carry out caching customization to the data that HDFS is stored by configuration file.HDFS itself is a Master/Slave structure, all requests require NN (Namenode name node) to be responsible for processing, thus provided convenience to being uniformly processed for data, it can designate whether to cache while storage, for reading data, from the memory of DN read obviously than from disk reading than it is efficiently more so that whole system is more efficient.In the present embodiment, client compiles according to demand in client Write configuration file, configuration file is write using xml, it is possible to specify parameter include: cache path, caching period, caching number (adjustable according to the usage frequency of file), cache replacement policy, cache cleaner strategy etc..The configuration file write can be transferred to name node or read configuration file by name node, due to the configuration information of HDFS and little, being fully read into memory will not influence system performance, therefore, configuration information is parsed using simple DOM (DOM Document Object Model) in this implementation, reasonable data structure is constructed later and saves information, and constructs a series of activities thread, guarantees that entire mechanism can be with work well.

Step S12 obtains metadata, and the description list that will wherein indicate data block: in this step, back end parses the configuration file in its memory, and constructs reasonable data structure and save, and supplies next process and use processing.In realization, by one object of API definition for the HDFS being arranged on back end, i.e. CachePolicy saves the contents such as the file path read in configuration information, caching number, replacement policy, statement period in the object.Similarly, pass through the API Calls of HDFS or the scanning thread ConfigMonitor of definition a cycle, read scan period from the configuration file hdfs-site.xml of HDFS itself (scan period is customized), according to the setting in above-mentioned period, each cycle reads a configuration information.Either newly-increased path has modified life cycle, caching number or replacement policy etc., requires to generate new CachePolicy;Furthermore, it is contemplated that certain paths itself may not change, but the file stored thereon may have a variation of increase or deletion, therefore even if all parameters in some path no change has taken place, also still construct CachePolicy object.

In this step, on back end, caching is managed by CacheMangaer.CacheMangaer runs on a Facade mode, has gathered all relevant data structures and thread, handles all and relevant affairs of caching.In the present embodiment, each data block for needing the file cached and each file to be stored on each back end is described respectively.These descriptions illustrate the cached parameters of file or data block.These descriptions, which are saved on name node, forms list, simultaneously, back end is sent to when name node and relevant back end are shaken hands, even if without caching, name node and shaking hands for back end are also intended to progress, so these do not increase the volume of transmitted data of the HDFS internal system to the transmission of the description of file or data block or increase seldom.Each needs buffered file to have a corresponding metadata to indicate CacheFile, it includes INodeFile (description of file metadata), all caching data blocks and the caching number, the life cycle that obtain from CachePolicy of file etc..As a common sense, a file is divided into multiple data blocks in HDFS.Wherein, each data block also corresponds to a description object CachedBlock, it is comprising where the data block DND (description of the DatanodeDescriptor back end in name node), the information such as block serial number, block length, block code name obtained from block (HDFS included data block object)；Furthermore, the description object of data block further includes four Status Flags: isSend, isCached, willDelete, isDeleted, these flag bits respectively indicate the data block state be respectively sent, be buffered, etc. it is to be deleted and deleted four states.These status representatives entire cycles of activity of one cache blocks.State about buffered data block is converted, and Fig. 3 is referred to.

In short, in this step, treatment process of the name node for the configuration file in its memory are as follows: obtain the object (or for the parameter or parameter set of a cache file description in configuration file) in the cache policy file；Later, judge that the path that the object is directed toward is a file or a catalogue (judging that on the position of path direction be a file or a catalogue or file), if it is a file, then construct the metadata for indicating this document；If it is a catalogue, the metadata for indicating each file is constructed respectively again after being resolved to multiple files.

And the method for constructing the metadata of an expression file is generally identical.It is only the parameter difference of the metadata built for different files.In the present embodiment, metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it；It further include the description to all data blocks of composing document that is, not only including the description to file in metadata.Due to a file in HDFS again multiple data chunks at, and these data blocks may not be to be stored in the same back end, so buffer descriptor therein is a very important part or a part of key in metadata for metadata.

In the present embodiment, the cached parameters to multiple files be may include in a configuration file, therefore, there may be multiple objects after a configuration file is processed.Procedures described above is the processing to an object, when there is multiple objects, is successively handled respectively according to above-mentioned steps each object, until having handled all objects in the cache policy file.

In the present embodiment, the internal logic structure that the part of above-mentioned configuration file is handled in name node is as shown in Figure 2, as seen from Figure 2, in the building process of caching, use reading CachePolicy object in the slave ConfigMonitor thread of TriggerMonitor thread cycle, then a path CachePolicy specified parses, if specified path is a file, the corresponding CacheFile of a CachePolicy;As soon as if specified path is a catalogue, then a CachePolicy resolves to multiple CacheFile.? There are two list about CacheFile in CacheMangaer, an in store file set for being currently completely in buffer status, another then in store present scan file set to be treated.It will create a series of CacheFile after TriggerMonitor thread reading object CachePolicy, by the comparison with history before, the new CacheFile not occurred while being added to two file sets；Updated CacheFile, (namely buffered module is recorded, but strategy changes) the more new state in whole cache file set, and the state of all relevant CachedBlock is thus updated, while being added to this cache list；The file (because what is read from CachePolicy is the file under All Paths, necessarily having duplicate) of no any change is neglected.It additionally needs and is mentioned that, when generating cache file object CacheFile, caching data block CachedBlock-aspect generated has been added in CacheFile, on the other hand it has been also added to inside a proprietary data acquisition system waitingPool, this data structure be specifically used to save caching in need data block, it is actually a priority query；In this embodiment it is believed that caching number priority from 1 to 3 is successively successively decreased, that is, caching number is fewer, and priority is higher.

It sets quantity in step S13 taking-up table to describe and issue back end when shaking hands: in this step, due in above-mentioned steps, TriggerMonitor all reads the file on cache path, and it is treated as CacheFile and CachedBlock, the work of next step just transfers to CacheMonitor to handle.The main function of this thread is to send the data block label not yet cached inside waitingPool to be added to inside DND, when waiting next name node to transmit order to back end, corresponding back end is sent by the cache command of these blocks.This thread (CacheMonitor) can periodically read the data block of specified number (maxCachedNum), call the adding method of DND, be added in its corresponding queue.Inside DND, building order every time is sent to after corresponding back end, these block messages sent are just deleted in DND, that is to say, that, these data blocks actually exist in waitingPool and cachedPool, are sent into and are stored in inside DND only label.No matter back end end is no successfully caches these data blocks, and DND is not go management.

Step S14 executes caching in its memory to specified data block: in this step, in back end, after the order for receiving name node by heartBeat (hand shake procedure or heartbeat connection), the back end can scan local FSDataSet and find corresponding piece, create its caching backup in memory.If not finding this block, call directly RPC (remote procedure call) the method reportBadCache () of creation then to tell name node not find this block, name node inspection caches number, if only 1 part, it so searches other DND of the block and continues to attempt to cache, while other primary threads of HDFS being called to do data Repair；Above-mentioned 1 part refers to the number cached.And original each data block default of HDFS is to save 3 parts, every portion all writes on different DN disks, so the corresponding DND inside NN has 3.Some data block is cached 1 time if purpose is, and first time NN finds a DND, but corresponding DN can not be cached, so search another DND of the data block, it is then forwarded to corresponding DN and carries out caching process, until 3 DND were attempted, there are no cache successfully, then it represents that the data block has been damaged in caching system.

If NN receives three parts of reports that cannot cache a data block, then it represents that the data block block thoroughly damages, then file where this data block also just damages, then directly deleting it.If finding this block, Insufficient disk space can not be cached, then same report selects next DND of the block to be attempted to name node；If successfully cached, name node Success Flag is reported, name node modifies the state and corresponding management work of this block.

The case where step S15 record executes caching: in this step, the case where back end takes method as described above to save data-block cache.

Step S16 returns to executive condition when shaking hands again: in this step, back end is when it shakes hands again with name node, the implementing result that the data-block cache received instructs that the last time of storage is shaken hands sends back name node, if a data-block cache success, whether other data blocks of file cache success where needing to judge it.Only in all data-block caches success for constituting a file, the success of its file cache just can determine that, and start to modify the state of this document；If 1 data block does not cache success, needs according to the method processing in step S14 or repair the data block.

In the present embodiment, further include the steps that being read after data buffer storage.When file is read in client application, it sends read request to name node first, after name node receives this request, first check for whether cache list includes this file, if comprising, directly the CachedBlock [] of its inside is packed, returns to one group of data block location information of client (this group of data block constitutes the file that request is read), this process is transparent to client；If there is no this cache file, then according to normal reading data process (i.e. in the prior art without using the process for reading file in the HDFS system of caching).After the specific location that client obtains data block, read request directly is sent to the back end DN where the position, it is desirable that reads data.After DN receives request, it can check that the data block whether there is in cache list, if it is present directly returning to memory file realizes reading data；Otherwise the normal process that reads is called to read from disk, whole process is equally transparent for client. In the present embodiment, all caching data block is described at the end NN and DN, it is therefore desirable to which certain mechanism guarantees that whole flow process smoothly carries out.In the core CacheMananer at the end NN, contain the method for a large amount of processing end DN feedback information, data block is successfully cached, unsuccessfully caching, low memory, can not find block etc. and all carried out corresponding processing.Two threads are also used simultaneously, and a ClearMonitor is used to periodically clear up expired cache file；Whether one file of inspection that one HandleMonitor is used to timing caches success.Only when a file cache success, just start to carry out caching countdown to it；And judge whether a file caches success, success is all cached according to all data blocks for being exactly its subordinate, then it can be assumed that the success of this file cache.

It is worth mentioning that, in the present embodiment, above-mentioned all thread or process or the function or functional module for realizing certain function, it is all the API Calls by HDFS or definition, therefore for the present embodiment, do not make a change existing HDFS structure, on the contrary, some functional modules are only increased on the basis of existing HDFS i.e. realizes the caching of data, and the memory of idle back end is utilized, and improves the speed of data reading.

Furthermore, in the present embodiment, further relate to a kind of data buffer storage device for distributed memory system, wherein distributed memory system includes multiple back end for storing data and passes through name node that is network connection and managing these multiple back end with multiple back end；As shown in figure 4, the device includes configuration file parsing module 41, cache information acquirement module 42, cache information sending module 43, caching realization module 44, read request module 45 and cache lookup module 46;Wherein, configuration file parsing module 41 is for uploading configuration file into its memory in name node and parsing the configuration file；It includes cache path, cache way, copy amount and the condition for cancelling caching that cache information, which obtains module 42 for obtaining the information in configuration file about caching, the information,；Cache information sending module 43 is for making name node that the cache information that the back end is related to are transferred to the back end when shaking hands with each back end；Caching realizes that module 44 is used to making that the back end takes out that it is related to be data cached and be stored in the memory of the back end according to received cache information；After completing data buffer storage, back end notifies the name node when shaking hands next time；Read request module 45 is for making client sending read request to the name node, and the data that the name node searches its requirement are that have caching, if any return cache position to client；Such as nothing, handled according to normal read request；For cache lookup module 46 for making client send data read request to back end, back end searches local cache list after receiving the data read request, if any directly sending data to the client by its memory；Such as nothing, reads its disk and obtain Data Concurrent The client is given, while notifying the name node caching abnormal.

In the present embodiment, it further includes that object acquisition unit 421, path judging unit 422, metadata formation unit 423, object judging unit 424 and data block list form unit 425 that above-mentioned cache information, which obtains module 42,;Wherein, object acquisition unit 421 is used to obtain an object in the cache policy file；Path judging unit 422 is used to judge that the path that the object is directed toward to be one or a catalogue, and if it is a file, calling metadata forms unit；If it is a catalogue, metadata is called to form unit after being resolved to multiple files；Metadata forms unit 423 and is used to construct the metadata for indicating the cache file, and the metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it；Object judging unit 424 is for judging whether to have handled all objects in the cache policy file；Data block list forms unit 425 for that will handle the data-block cache set of paths in the metadata that each object obtains in all buffer descriptors together, obtains present scan and needs caching data block list.

But it cannot be understood as limitations on the scope of the patent of the present invention.It should be pointed out that for those of ordinary skill in the art, without departing from the inventive concept of the premise, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

Claims

1, a kind of data cache method for distributed memory system, the distributed memory system includes multiple back end for storing data and passes through name node that is network connection and managing the multiple back end with the multiple back end, it is characterized in that, described method includes following steps:

A) name node uploads configuration file into its memory and parses the configuration file；

B the information in configuration file about caching) is obtained, the information includes cache path, cache way, copy amount and the condition for cancelling caching；

C) cache information that the back end is related to is transferred to the back end when shaking hands with each back end by name node；

D it) back end take out that it is related to data cached and is stored in the memory of the back end according to received cache information；After completing data buffer storage, back end notifies the name node when shaking hands next time.

2, the data cache method for distributed memory system according to claim 1, which is characterized in that further include following steps:

E) client sending reads request to the name node, and the data that the name node searches its requirement are that have caching, if any return cache position to client；Such as nothing, handled according to normal read request；

F) client sends data read request to back end, and back end searches local cache list after receiving the data read request, if any directly sending data to the client by its memory；It such as nothing, reads its disk acquirement Data Concurrent and gives the client, while notifying the name node caching abnormal.

3, the data cache method according to claim 2 for distributed memory system, which is characterized in that the configuration file is to define and be transferred to the name node by client；The configuration file is converted into a cache policy file on the name node, includes multiple objects for indicating cache file in the cache policy file；The name node obtains on each period by periodically scanning thread and replaces the cache policy file in its memory.

4, the data cache method according to claim 3 for distributed memory system, which is characterized in that the step B) further comprise:

B1 an object in the cache policy file) is obtained；

B2) judge that the path that the object is directed toward is one or a catalogue, if it is a file, execution step B3);If it is a catalogue, step B3 is executed after being resolved to multiple files); B3 the metadata for indicating the cache file) is constructed, the metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it；

B4) repeat the above steps Bl)-B3) until having handled all objects in the cache policy file.

5, the data cache method according to claim 4 for distributed memory system, which is characterized in that the step B) in further include following steps:

Data-block cache set of paths in the metadata that each object obtains in all buffer descriptors will be handled together, present scan is obtained and need caching data block list.

6, the data cache method according to claim 5 for distributed memory system, which is characterized in that the step C) further comprise:

C1 the present scan) needs to take out the data-block cache path of setting number in caching data block list；

C2) the back end being directed toward according to the cache path, sends the back end for the data-block cache path when shaking hands with the back end.

7, the data cache method according to claim 6 for distributed memory system, which is characterized in that the step D) further comprise:

D1 all data-block cache paths obtained when this is shaken hands) are obtained, and successively read the data block according to each data-block cache path, and it is cached in the memory of the back end；If do not found data block or memory headroom is limited to cache, then marks the data-block cache to fail and handle next data-block cache path；

D2 the buffered results of all data blocks of the name node) are returned when shaking hands next time.

8, a kind of data buffer storage device for distributed memory system, the distributed memory system includes multiple back end for storing data and passes through name node that is network connection and managing the multiple back end with the multiple back end, it is characterized in that, described device includes:

Configuration file parsing module: for uploading configuration file into its memory in name node and parsing the configuration file；

Cache information obtains module: for obtaining the information in configuration file about caching, the information Including cache path, cache way, copy amount and the condition for cancelling caching；

Cache information sending module: for making name node that the cache information that the back end is related to are transferred to the back end when shaking hands with each back end；

Caching realizes module: being stored in the memory of the back end for make that the back end takes out that it is related to data cached and according to received cache information；After completing data buffer storage, back end notifies the name node when shaking hands next time.

9, device according to claim 8, which is characterized in that further include:

Read request module: for making client sending read request to the name node, the data that the name node searches its requirement are that have caching, if any return cache position to client；Such as nothing, handled according to normal read request；

Cache lookup module: for making client send data read request to back end, back end searches local cache list after receiving the data read request, if any directly sending data to the client by its memory；It such as nothing, reads its disk acquirement Data Concurrent and gives the client, while notifying the name node caching abnormal.

10, device according to claim 8, which is characterized in that the cache information obtains module further include:

Object acquisition unit: for obtaining an object in the cache policy file；

Path judging unit: for judging that the path that the object is directed toward is one or a catalogue, if it is a file, calling metadata forms unit；If it is a catalogue, metadata is called to form unit after being resolved to multiple files；

Metadata forms unit: for constructing the metadata for indicating the cache file, the metadata includes that file describes, all buffer descriptors included by file, caches number and caching period；Wherein, the buffer descriptor includes its cache path, description, block serial number, block length and its data block status mark of the back end on the name node where it；

Object judging unit: all objects in the cache policy file have been handled for judging whether；Data block list forms unit: for that will handle the data-block cache set of paths in the metadata that each object obtains in all buffer descriptors together, obtaining present scan and needs caching data block list.