CN103020315A

CN103020315A - Method for storing mass of small files on basis of master-slave distributed file system

Info

Publication number: CN103020315A
Application number: CN2013100091824A
Authority: CN
Inventors: 王蕾; 何连跃; 徐叶; 李姗姗; 戴华东; 吴庆波; 丁滟; 黄辰林; 付松龄
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2013-01-10
Filing date: 2013-01-10
Publication date: 2013-04-03
Anticipated expiration: 2033-01-10
Also published as: CN103020315B

Abstract

The invention discloses a method for storing a mass of small files on the basis of a master-slave distributed file system, and aims to solve the problem about storing the mass small files in the master-slave distributed file system. The technical scheme includes that the method includes that a mass small file storage system is deployed and initialized, and then small files are created and read by a SmallFileAPI (small file-application program interface) of a client side according to a command received from a keyboard. When the small files are created by a system, the SmallFileAPI creates data files of the small files according to small file paths obtained from the client side, writes in small file data and simultaneously creates small file indexes at data nodes; when the small files are read by the system, data node information corresponding to a parent directory is acquired according to the small file paths, an index request is transmitted to any of the data nodes, and finally the small file data are read from the data files according to index information. By the aid of the method, the problem about mass of storage metadata of the mass of small files is solved, writing efficiency of the mass small file storage system is improved, and reliability of the system is guaranteed.

Description

A kind of mass small documents storage means based on master-salve distributed file system

Technical field

The present invention relates to the storage means of mass small documents on the master-salve distributed file system of the large file storage towards magnanimity.

Background technology

Along with the development of novel computing technique, be that enterprise or individual's data all begin to increase rapidly.It is not only the memory capacity problem that mass data increases what bring, returns data management, memory property has been brought challenge, becomes the key problem that the cloud computing epoch need to solve.In order to guarantee high available, the highly reliable and economy of data, cloud computing adopts the mode of distributed storage to store data, adopts the mode of redundant storage to guarantee the reliability of the data.In order to satisfy the demand of a large number of users, the memory technology of cloud computing must have high-throughput and high transmission rates.Data storage problem for cloud computing, industry member and academia have proposed multiple solution, this is comprising Google file system GFS, Hadoop increase income file system HDFS, towards NoSQL storage system Dynamo, Cassandra, MongoDB etc. semi-structured and the structural data storage.

Early stage in cloud computing, the design of storage system is mainly towards efficient storage and the access of the large file of magnanimity, a little less than the support to small documents, but the development along with personal terminal and mobile Internet, small documents shared ratio in cloud storage system is more and more higher, and the efficient storage of mass small documents and access become needs the urgent problem that solves.Small documents refers to the file of size from several KB to tens KB.For example, Taobao needs to store the commodity picture of magnanimity, and these pictures all are small documents; The search engine such as Google, Baidu need to grasp into ten thousand more than one hundred million webpages from network, and these webpages all are small documents.The small documents of trillion quantity sizes has consisted of mass small documents, if mass small documents can not efficient storage, will cause using the requirement that does not realize or do not satisfy the client in the face of mass small documents.The present invention mainly solves the efficient storage problem of mass small documents.

The architecture of master-salve distributed file system as shown in Figure 1.This type of distributed file system is comprised of a centralized meta data server (being also referred to as the metadata node) and a plurality of distributed data server (being also referred to as Data Node).The metadata of meta data server managing file system comprises the bibliographic structure of file system, the contents such as the memory location of each file, size, various attributes.The data of data server memory file system, i.e. file itself.During the master-salve distributed file system of client-access, accesses meta-data server at first obtains the metadata information of file, and then according to these information, the data server of access storage respective file obtains file.It is simple, manageable that the advantage of this type of file system is that design realizes, can realize that height is fault-tolerant, highly reliable, the design of high-throughput by simple technology.Shortcoming is that if adopt single meta data server, then it can become the performance bottleneck of system access, and single point failure occurs easily.If adopt the meta data server cluster, then can cause metadata management complicated, and reduce the efficient of metadata access.

The Typical Representative of this type of distributed file system is Google file system GFS, Hadoop file system HDFS, Luster, PVFS etc.Wherein HDFS increases income, and it can operate on the general hardware platform.The cluster of operation HDFS is comprised of a metadata node and a plurality of Data Node.HDFS is the system of an Error Tolerance, is fit to be deployed on the cheap computing machine, and the data access of high-throughput is provided, and is fit to very much the application on the large-scale data set, and supports the data of streaming file reading system.

At present, the storage mass small documents of master-salve distributed file system mainly contains following several method:

Method one is the Hadoop archive file, is called for short HAR(Hadoop Archive).Store the problem that mass small documents can exhaust the internal memory of metadata node among the HDFS in order to solve, Hadoop has proposed the method for HAR archive file.Can efficiently file be put into the HDFS piece by HAR, when reducing the use of metadata node internal memory, still allow file is carried out transparent access.The method is packaged into a HAR file with a plurality of small documents.But the HAR file comes with some shortcomings: at first, HAR file in case create just can not be revised, and increase or delete small documents, must create new HAR file, next, created the HAR file after, small documents originally can not deleted automatically, need to manually do deletion.Therefore HAR file unusual poor efficiency when processing mass small documents.

Method two is the Xuhui Liu of the Chinese Academy of Sciences, the Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS method that Jizhong Han etc. propose: small documents is packed, form a large file, the head of large file is preserved the index information of all small documents in this document.This large file is as the file storage of master-salve distributed file system.Each small documents retrieval, client is the query metadata server at first, obtains the metadata information of large file, and client is mutual with data server again, reads large file, obtains from the head index information, reads the file content of back again.Since towards the distributed file system of the large file storage of magnanimity, the mode that adopts streaming to read more, the efficient that reads at random is low, postpones greatly, and when a plurality of clients read a plurality of small documents simultaneously, efficient was very low, very flexible.

Method three is methods of small documents storage optimization correlation technique research under the HDFS of the announcements such as the Jiang Liu of Beijing University of Post ﹠ Telecommunication: small documents is kept in the data block of Data Node, the information such as position of metadata record small documents in data block of small documents, and be stored on the Data Node.The metadata information of all small documents also is kept on the hard disk of meta data server.Each small documents retrieval, client inquires about first on the Data Node of recent visit whether this small documents is arranged, if just do not need the accesses meta-data node, and meta data server need to read the complete metadata information that hard disk obtains accessed small documents, return to client, client is mutual with back end again, obtains small documents.The problem that the method exists is that the metadata access efficient of small documents is low, postpones long.

The problems such as to sum up, present master-salve distributed file system all is based on the method for large file storage, as is used for storing mass small documents, and then the ubiquity system effectiveness is low, small documents search index efficient is low, system reliability is poor.How to store efficiently, reliably mass small documents on the master-slave mode Distribute file system of the large file storage towards magnanimity is the technical matters that those skilled in the art pay close attention to.

Summary of the invention

The technical problem to be solved in the present invention is: general master-salve distributed file system can be stored the file of super large data scale, have high fault tolerance and enhanced scalability, but be used for storing mass small documents and can produce some problems: the master-salve distributed file system of (1) centralized Metadata Service only has single metadata node, quantity of documents determines the scale of metadata, mass small documents can expend the internal memory of metadata node, and its metadata can exhaust the internal memory of metadata node and exceed the limit that computer hardware can reach.(2) recall precision of mass small documents is low, in case the file data amount reaches after the certain scale, the recall precision of file sharply descends, and causes system to carry out slowly.

Technical scheme of the present invention is:

The first step is disposed the mass small documents storage system.The mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node.The index position maintenance module is each catalogue distribute data node (with IP address and port numbers as sign), mapping relations to catalogue and Data Node sort, return the Data Node of managing this small documents index, the i.e. assigned Data Node of small documents catalogue to client.The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table.The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark.The value of these three updating mark is " Y " and " N " two kinds.Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade.The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, but the caching record that a store M catalogue is corresponding, the Data Node address (being index position mapping item corresponding to catalogue) of the management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M arranges voluntarily according to user's needs, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client.The small documents index module need to create small documents index data structure Index when starting, it is the data structure that sorts with B-tree (seeing that R.Bayer andE.M.McCreight1972 is at the article " administering and maintaining of large-scale Ordered indices " of periodical Acta Informatica) according to directory path, its node is the set of small documents index record under the catalogue, equally also be a B-tree, with small documents order by name.The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents.

Second step carries out initialization to the mass small documents storage system, may further comprise the steps:

2.1 initialization index position mapping table, method are the data that read the index position mapping table from file dirToDatanode.If the dirToDatanode file is empty, the index position mapping table will be initialized as empty table.Afterwards, in a single day the index position mapping table has its all data of modification again to be saved in the dirToDatanode file.

2.2 initialization waiting list waitArrangeQueue, method is to read queuing data from file waitArrangeDir, if the waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue.Afterwards, in a single day waitArrangeQueue has its all data of modification again to be saved in the waitArrangeDir file.

2.3 initialization index data structure Index, the B-that Index is initialized as a sky sets, and Index dynamically reads index data according to the demand that client proposes from index file and index journal file.

In the 3rd step, the SmallFileAPI of client operates small documents according to the instruction of accepting from keyboard, if create small documents, carries out for the 4th step, turns for the 8th step if read small documents.

In the 4th step, then SmallFileAPI obtains data file and index position mapping item corresponding to small documents catalogue under the indicated catalogue in small documents path (being called for short the small documents catalogue) from the path of client acquisition establishment small documents.Data file is the in esse file of the NameSpace of master-salve distributed file system, the data that are used for storing all small documents under the same catalogue.Data file is comprised of data file head and data recording subsequently.The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number (Version) of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored.The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record.Every record is comprised of record length, key length, key, four of values.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.Each small documents is as a record, during the storage small documents directly the afterbody in data file append.

4.1 whether comprise the relevant information of small documents catalogue in the SmallFileAPI query caching module of client.The relevant information of small documents catalogue is included as the input/output information of this catalogue manipulative indexing position mapping item and small documents data file.If can from cache module, obtain, turned for the 5th step.If in cache module, do not find, execution in step 4.2.

4.2 the SmallFileAPI of client is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create the small documents catalogue, simultaneously the index position maintenance module of metadata node is three data nodes of this catalogue distribution, and the mapping relations of catalogue and three the data nodes list item as mapping table is inserted in the index position mapping table.

The index position maintenance module distributes the concrete grammar of three data nodes for this catalogue:

4.2.1, the total data node information that the index position maintenance module is safeguarded from the metadata node (can register at the metadata node by master-salve distributed file system data node, so have the information of all Data Nodes in the cluster in the metadata node) in obtain at random three data nodes, if successfully obtain three data nodes, the updating mark of these three data nodes is initialized as Y, turn step 4.2.3, if do not find three data nodes, execution in step 4.2.2.

4.2.2, catalogue that can not the distribute data node is joined among the formation waitArrangeQueue, the content of waitArrangeQueue is saved in the waitArrangeDir file again.To the signal of client return failure, turned for 13 steps, end operation.

4.2.3 the index position maintenance module is saved in the index position mapping table in the dirToDatanode file again.

4.3, make variable X=1;

4.4, under the catalogue of small documents, do not create dataX by the metadata node if the data file dataX under the small documents catalogue does not exist then;

4.5, if the data file dataX under the small documents catalogue exists, the output information of the data file dataX of the SmallFileAPI of client under the metadata node acquisition request small documents catalogue, if successfully obtain dataX output information (dataX is taken by other client under the small documents catalogue at this moment), execution in step 4.6; If successfully do not obtain dataX output information, make X increase 1, if X＜=P turns 4.4, if X〉P, return error message to client, turned for the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by the user, general P=32;

4.6 the SmallFileAPI of client proposes Data Node request corresponding to inquiry small documents catalogue to the index position maintenance module of metadata node, index position maintenance module search index position mapping table and this list item that this small documents catalogue is corresponding return to client.

4.7 the data file output information of the SmallFileAPI of client index position mapping item that this small documents catalogue is corresponding and this small documents is recorded to cache module, and is if cache module is full, then least recently used by LRU() algorithm eliminates.

In the 5th step, client is written to the data of small documents in the data file.Client is write small documents among the 4.5 resulting data file dataX as a data record, and returns the side-play amount of small documents in data file, and namely data are recorded in the position in the data file.

In the 6th step, the small documents index module of Data Node creates the small documents index.Client sends title, the side-play amount of small documents in data file and the request of Data Node updating mark and proposition establishment small documents index of the data file of small documents path, storage small documents to the master data node in three data nodes of the index position mapping items of 4.6 acquisitions.If this master data node breaks down, return fail result to client, turned for the 13 step, if the master data node is normal, then carry out following work by the small documents index module after the request of receiving of master data node:

6.1 the small documents index module judges whether that according to the updating mark of master data node needs upgrade operation to the small documents index under this catalogue, updating mark position Y carries out 6.2, and updating mark is that N goes to 6.3.

6.2 small documents index module read path in distributed file system is/index/ small documents directory path .index and/two files of index/ small documents directory path .log.Small documents directory path .index file is called index file, deposits all small documents index records under this catalogue, these index records with the B-data tree structure to the rear preservation of sorting of small documents pathname.Small documents directory path .log is called journal file, deposits the certain operations record to small documents index under this catalogue, comprises establishment, deletion index, and it is comprised of action type and index record.Action type refers to the action that operates, as creating and deletion.

The small documents index module reads index data according to small documents directory path .index file and small documents directory path .log file, and step is as follows:

6.2.1 the small documents index module reads the data of small documents directory path .index, generates B-according to these data and sets as Knots inserting in the index data structure Index of internal memory.

6.2.2 the small documents index module reads the operation note of index among the small documents directory path .log successively, re-starts operation according to these operation notes.For creating, the index information that then extracts this operation note is inserted among the index data structure Index according to B-tree insertion algorithm such as action type.

6.2.3 small documents index module rename small documents directory path .index is small documents directory path .index.tmp.Newly-built index file called after small documents directory path .index, and all index records corresponding to this small documents catalogue among the Index are saved in the newly-built index file, the deletion suffix is the index file of .tmp.

6.3 empty small documents directory path .log file content, and obtain the write operation information of this small documents directory path .log by the small documents index module, prepare to write journalizing.

6.4 small documents pathname, Data Filename, offset information generating indexes record that the small documents index module will be obtained from client, search index data structure Index, index record is inserted into correspondence position in the Index tree by small documents pathname ordering, the operation that creates index is write by among the 6.3 small documents directory path .log that obtain.

6.5 the small documents index module sends small documents to client and creates successful signal.

In the 7th step, the cache module of client is revised the updating mark of master data node in caching record corresponding to small documents catalogue.The updating mark of master data node is revised as N, and the updating mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y.Turned for the 13 step.

The 4th step is to the process of the 7th step for the establishment small documents of mass small documents storage system.

The 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, according to small documents directory search client-cache module, then carried out for the 9th step if can not find Data Node information corresponding to small documents catalogue, if found Data Node information corresponding to little civilian catalogue, then turned for the tenth step.

The 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, the index position maintenance module search index position mapping table index position mapping item that this small documents catalogue is corresponding returns to client, client with the information recording/ that obtains in cache module.

The tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes, if updating mark is Y, then the small documents index module is upgraded the index of all small documents under the small documents catalogue, and its concrete steps are as follows:

10.1 the small documents index module reads the data of small documents directory path .index, and generation B-sets as Knots inserting in index data structure Index.

10.2 the small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes.

10.3 Data Node returns to client by small documents index module inquiry Index with the index record of small documents.

The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the inquiring client terminal cache module, obtain the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, and be recorded to cache module.

In the 12 step, the SmallFileAPI of client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents.

The 8th step to the 12 step is the process that reads small documents of mass small documents storage system.

In the 13 step, the SmallFileAPI of client determines whether and still has the instruction input, if having, turned for the 3rd step; If nothing finishes.

The present invention is a kind of mass small documents storage means based on master-salve distributed file system, adopts the present invention can reach following technique effect:

(1) it was stored in the small documents data in the master-salve distributed file system by the 5th step, realized the distributed storage of data and fault-tolerant, had reached Mass storage and the reliability of data.

(2) by the 6th step the index of small documents is distributed to each Data Node and manages the problem of cell data node when the storage mass small documents that solved, step 6.2 is by distributed file system storage small documents index simultaneously, it is fault-tolerant to utilize the fault tolerant mechanism of distributed file system itself that the small documents index is carried out, and has reduced the danger that the small documents index is lost.

(3) on above basis, the mass small documents storage system at client-cache module buffer memory the small documents index position information commonly used of user and the information of data file, avoid mutual continually with the metadata node, greatly improved the performance of system.

Experiment shows that the present invention can solve the huge problem of mass small documents storing metadata well, and the mass small documents storage system is write efficient and is greatly improved the fault-tolerant reliability that guarantees system of small documents index.

Description of drawings

The structural drawing of the master-salve distributed file system of Fig. 1 background technology;

The mass small documents storage system overall construction drawing that Fig. 2 first step of the present invention is disposed;

Fig. 3 overview flow chart of the present invention;

Fig. 4 index position mapping table of the present invention structural drawing.

The data file structure figure that Fig. 5 the present invention created in the 4th step;

The small documents index record structural drawing that Fig. 6 the present invention 6.4 step small documents index module generates;

Embodiment

Accompanying drawings the specific embodiment of the present invention.

Fig. 1 is the structural drawing of master-salve distributed file system.

Fig. 2 is the overall construction drawing of the mass small documents storage system of first step structure of the present invention.The mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node.The index position maintenance module is each catalogue distribute data node (with IP address and port numbers as sign), mapping relations to catalogue and Data Node sort, return the Data Node of managing this small documents index, the i.e. assigned Data Node of small documents catalogue to client.The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table.The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark as shown in Figure 4.The value of these three updating mark is " Y " and " N " two kinds.Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade.The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, but the caching record that a store M catalogue is corresponding, the Data Node address (being index position mapping item corresponding to catalogue) of the management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M arranges voluntarily according to user's needs, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client.The small documents index module need to create small documents index data structure Index when starting, it is the data structure that sorts with the B-tree according to directory path, its node is the set of small documents index record under the catalogue, equally also is a B-tree, with small documents order by name.The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents.

Fig. 3 is overview flow chart of the present invention.

The first step is disposed the mass small documents storage system.

Second step, the initialization of mass small documents storage system.

The 3rd step, select the operation to small documents, turned for the 4th step if create small documents, turned for the 8th step if read small documents.

In the 4th step, the SmallFileAPI of client comes the data file of newly-built storage small documents according to the path that creates small documents.

In the 5th step, the SmallFileAPI of client is written to the data of small documents in the data file.

In the 6th step, the small documents index module creates the index of small documents.

In the 7th step, client is revised the updating mark of the master data node of caching record corresponding to the medium and small file directory of cache module.Turned for the 13 step.

In the 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, searches Data Node information corresponding to catalogue in cache module, if search less than Data Node information, then carried out for the 9th step, if found Data Node information, then turned for the tenth step.

In the 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, and three data nodes and updating mark are returned to client, client with the information recording/ that obtains in cache module.

In the tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes.

The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, in cache module, search the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, the data file input message of small documents is recorded to cache module.

In the 12 step, client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents.

In the 13 step, the SmallFileAPI of client determines whether that keyboard still has the instruction input, if having, turns for the 3rd step; If nothing finishes.

Fig. 4 is the catalogue of small documents and the structure of back end mapping table.Each catalogue is to having 3 data nodes, the particular location of representative data node.Each Data Node updating mark thereafter represents whether the small documents index in current this Data Node internal memory needs to upgrade, if be Y, then expression needs to upgrade, if for N then identify and do not need renewal.

Fig. 5 is the data file structure of small documents storage.Data file is comprised of data file head and data recording subsequently.The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number (Version) of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored.The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record.Every record is comprised of record length, key length, key, four of values.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.

Fig. 6 is the index record structural drawing that 6.4 step small documents index module generate.Path, Data Filename and the small documents side-play amount in data file that comprises small documents in each index record.

Claims

1. mass small documents storage means based on master-salve distributed file system is characterized in that may further comprise the steps:

The first step is disposed the mass small documents storage system, and the mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system; These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node; The index position maintenance module is each catalogue distribute data node, and the mapping relations of catalogue and Data Node are sorted, and returns the Data Node of this small documents index of management, the i.e. assigned Data Node of small documents catalogue to client; The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table; The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark; The value of these three updating mark is " Y " and " N " two kinds; Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade; The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue; The capacity of the cache module of client is initialized as M, the index position mapping item that each caching record storage directory is corresponding and the data file input/output information of small documents, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client; Create small documents index data structure Index when the small documents index module starts, it is the data structure that sorts with the B-tree according to directory path, and its node is the set of small documents index record under the catalogue, equally also is a B-tree, with small documents order by name; The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents;

2.1 initialization index position mapping table, method are the data that read the index position mapping table from file dirToDatanode, if the dirToDatanode file is empty, the index position mapping table will be initialized as empty table;

2.2 initialization waiting list waitArrangeQueue, method is to read queuing data from file waitArrangeDir, if the waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue;

2.3 initialization index data structure Index, the B-that Index is initialized as a sky sets;

In the 3rd step, the SmallFileAPI of client operates small documents according to the instruction of accepting from keyboard, if create small documents, carries out for the 4th step, turns for the 8th step if read small documents;

In the 4th step, SmallFileAPI to create the path of small documents from client, then obtains the indicated catalogue in small documents path and be data file and index position mapping item corresponding to small documents catalogue under the small documents catalogue, and method is:

4.1 whether comprise the relevant information of small documents catalogue in the SmallFileAPI query caching module of client, the relevant information of small documents catalogue is included as the input/output information of this catalogue manipulative indexing position mapping item and small documents data file, if can from cache module, obtain, turned for the 5th step; If in cache module, do not find, execution in step 4.2;

4.2 the SmallFileAPI of client is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create the small documents catalogue, simultaneously the index position maintenance module of metadata node is three data nodes of this catalogue distribution, and the mapping relations of catalogue and three the data nodes list item as mapping table is inserted in the index position mapping table; The index position maintenance module distributes the concrete grammar of three data nodes for this catalogue:

4.2.1, the index position maintenance module obtains three data nodes at random from the total data node information that the metadata node is safeguarded, if successfully obtain three data nodes, the updating mark of these three data nodes is initialized as Y, turn step 4.2.3, if do not find three data nodes, execution in step 4.2.2;

4.2.2, catalogue that can not the distribute data node is joined among the formation waitArrangeQueue, the content of waitArrangeQueue is saved in the waitArrangeDir file, to the signal of client return failure, turned for 13 steps, end operation;

4.2.3 the index position maintenance module is saved in the index position mapping table in the dirToDatanode file;

4.3, make variable X=1;

4.5 if the data file dataX under the small documents catalogue exists, the output information of the data file dataX of the SmallFileAPI of client under the metadata node acquisition request small documents catalogue is if successfully obtain dataX output information, execution in step 4.6; If successfully do not obtain dataX output information, make X increase 1, if X＜=P turns 4.4, if X〉P, return error message to client, turned for the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by the user, general P=32;

4.6 the SmallFileAPI of client proposes Data Node request corresponding to inquiry small documents catalogue to the index position maintenance module of metadata node, index position maintenance module search index position mapping table and this list item that this small documents catalogue is corresponding return to client;

4.7 the index position mapping item that the SmallFileAPI of client is corresponding with this small documents catalogue and the data file output information of this small documents are recorded to cache module, if cache module is full, be that least recently used algorithm is eliminated by lru algorithm then;

In the 5th step, client is write small documents among the 4.5 resulting data file dataX as a data record, and returns the side-play amount of small documents in data file, and namely data are recorded in the position in the data file;

In the 6th step, the small documents index module of Data Node creates the small documents index; Client sends title, the side-play amount of small documents in data file and the request of Data Node updating mark and proposition establishment small documents index of the data file of small documents path, storage small documents to the master data node in three data nodes of the index position mapping items of 4.6 acquisitions, if this master data node breaks down, return fail result to client, turned for the 13 step, if the master data node is normal, then carry out following work by the small documents index module after the request of receiving of master data node:

6.2 small documents index module read path in distributed file system is/index/ small documents directory path .index and/two files of index/ small documents directory path .log, small documents directory path .index file is called index file, deposit all small documents index records under this catalogue, these index records with the B-data tree structure to the rear preservation of sorting of small documents pathname; Small documents directory path .log is called journal file, deposits the operation note to small documents index under this catalogue, is comprised of action type and index record; Read index data according to small documents directory path .index file and small documents directory path .log, step is as follows:

6.2.1 the small documents index module reads the data of small documents directory path .index, generates B-according to these data and sets as Knots inserting in the index data structure Index of internal memory;

6.2.2 the small documents index module reads the operation note of index among the small documents directory path .log successively, re-starts operation according to these operation notes;

6.2.3, small documents index module rename small documents directory path .index is small documents directory path .index.tmp, newly-built index file called after small documents directory path .index, and all index records corresponding to this small documents catalogue among the Index are saved in the newly-built index file, the deletion suffix is the index file of .tmp;

6.3 empty small documents directory path .log file content, and obtain the write operation information of this small documents directory path .log by the small documents index module, prepare to write journalizing;

6.4 small documents pathname, Data Filename, offset information generating indexes record that the small documents index module will be obtained from client, search index data structure Index, index record is inserted into correspondence position in the Index tree by small documents pathname ordering, the operation that creates index is write by among the 6.3 small documents directory path .log that obtain;

6.5 the small documents index module sends small documents to client and creates successful signal;

The 7th step, the cache module of client is revised the updating mark of master data node in caching record corresponding to small documents catalogue, the updating mark of master data node is revised as N, and the updating mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y, turns for the 13 step;

The 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, according to small documents directory search client-cache module, then carried out for the 9th step if can not find Data Node information corresponding to small documents catalogue, if found Data Node information corresponding to little civilian catalogue, then turned for the tenth step;

The 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, the index position maintenance module search index position mapping table index position mapping item that this small documents catalogue is corresponding returns to client, client with the information recording/ that obtains in cache module;

The tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes, if updating mark is Y, then the small documents index module is upgraded the index of all small documents under the small documents catalogue, and concrete steps are as follows:

10.1 the small documents index module reads the data of small documents directory path .index, and generation B-sets as Knots inserting in index data structure Index;

10.2 the small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes;

10.3 Data Node returns to client by small documents index module inquiry Index with the index record of small documents;

The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the inquiring client terminal cache module, obtain the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, and be recorded to cache module;

In the 12 step, the SmallFileAPI of client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents;

2. the mass small documents storage means based on master-salve distributed file system as claimed in claim 1, it is characterized in that described data file is the in esse file of NameSpace of master-salve distributed file system, the data that are used for storing all small documents under the same catalogue; Data file is comprised of data file head and data recording subsequently; The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored; The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record, and every record is comprised of record length, key length, key, four of values; Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents; Each small documents is as a record, during the storage small documents directly the afterbody in data file append.

3. the mass small documents storage means based on master-salve distributed file system as claimed in claim 1 is characterized in that described P32.