CN103020315B

CN103020315B - A kind of mass small documents storage means based on master-salve distributed file system

Info

Publication number: CN103020315B
Application number: CN201310009182.4A
Authority: CN
Inventors: 王蕾; 何连跃; 徐叶; 李姗姗; 戴华东; 吴庆波; 丁滟; 黄辰林; 付松龄
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2013-01-10
Filing date: 2013-01-10
Publication date: 2015-08-19
Anticipated expiration: 2033-01-10
Also published as: CN103020315A

Abstract

The invention discloses a kind of mass small documents storage means based on master-salve distributed file system, object solves the problem that master-salve distributed file system stores mass small documents generation.Technical scheme first to be disposed and initialization mass small documents storage system, and then the SmallFileAPI of client to create small documents according to the instruction accepted from keyboard or reads.During system creation small documents, SmallFileAPI according to the data file of the newly-built small documents in small documents path obtained from client, and writes small documents data, creates small documents index on Data Node simultaneously; When system reads small documents, obtain Data Node information corresponding to its parent directory according to small documents path, and arbitrary Data Node sends index request wherein, finally from data file, reads small documents data according to index information.Adopt the present invention can solve the huge problem of mass small documents storing metadata, improve mass small documents storage system and write efficiency, and the reliability of system can be ensured.

Description

A kind of mass small documents storage means based on master-salve distributed file system

Technical field

The present invention relates to the storage means of mass small documents in the master-salve distributed file system stored towards magnanimity large files.

Background technology

Along with the development of novel computing technique, be that enterprise or the data of individual all start to increase rapidly.What mass data growth brought is not only memory capacity problem, returns data management, memory property brings challenge, becomes the key problem that the cloud computing epoch need solution.In order to ensure the High Availabitity of data, highly reliable and economy, cloud computing adopts the mode of distributed storage to store data, adopts the mode of redundant storage to ensure the reliability of data.In order to meet the demand of a large number of users, the memory technology of cloud computing must have high-throughput and high transmission rates.For the data storage problem of cloud computing, industry member and academia propose multiple solution, this to increase income file system HDFS, the NoSQL storage system Dynamo stored towards semi-structured and structural data, Cassandra, MongoDB etc. comprising Google file system GFS, Hadoop.

Early stage in cloud computing, the design of storage system is mainly towards efficient storage and the access of magnanimity large files, more weak to the support of small documents, but along with the development of personal terminal and mobile Internet, the ratio of small documents shared by cloud storage system is more and more higher, and efficient storage and the access of mass small documents become the problem needing urgent solution.Small documents refers to size from a few KB to the file of tens KB.Such as, Taobao, need the commodity picture storing magnanimity, these pictures are all small documents; The search engine such as Google, Baidu needs to capture into ten thousand more than one hundred million webpages from network, and these webpages are all small documents.The small documents of trillion quantity sizes constitutes mass small documents, if mass small documents can not efficient storage, will cause in the face of mass small documents application does not realize or can not meet the requirement of client.The present invention mainly solves the efficient storage problem of mass small documents.

The architecture of master-salve distributed file system as shown in Figure 1.This type of distributed file system is made up of a centralized meta data server (also referred to as metadata node) and multiple distributed data server (also referred to as Data Node).The metadata of meta data server managing file system, comprises the bibliographic structure of file system, the content such as memory location, size, each attribute of each file.The data of data server memory file system, i.e. file itself.During the master-salve distributed file system of client-access, first accesses meta-data server, obtains the metadata information of file, and then according to these information, access stores the data server of respective file, obtains file.It is simple, manageable that the advantage of this type of file system is that design realizes, and realizes the design of high fault-tolerant, highly reliable, high-throughput by simple technology.Shortcoming is, if adopt single meta data server, then it can become the performance bottleneck of system access, and easily occurs single point failure.If employing metadata server cluster, then metadata management can be caused complicated, and reduce the efficiency of metadata access.

The Typical Representative of this type of distributed file system is Google file system GFS, Hadoop file system HDFS, Luster, PVFS etc.Wherein HDFS increases income, and it can operate on general hardware platform.The cluster running HDFS is made up of a metadata node and multiple Data Node.HDFS is the system of an Error Tolerance, is applicable to being deployed on cheap computing machine, provides the data access of high-throughput, is applicable to very much the application on large-scale data set, and supports the data of streaming file reading system.

At present, the storage mass small documents of master-salve distributed file system mainly contains following several method:

Method one is Hadoop archive file, is called for short HAR(Hadoop Archive).Store to solve in HDFS the problem that mass small documents can exhaust the internal memory of metadata node, Hadoop proposes the method for HAR archive file.Efficiently file can be put into HDFS block by HAR, while minimizing metadata node internal memory uses, still allow to carry out transparent access to file.Multiple small documents is packaged into a HAR file by the method.But HAR file comes with some shortcomings: first, HAR file, once establishment just can not be revised, increase or delete small documents, must create new HAR file, and secondly, after creating HAR file, small documents originally can not be deleted automatically, needs manually to do deletion.Therefore HAR file unusual poor efficiency when processing mass small documents.

Method two is Chinese Academy of Sciences Xuhui Liu, the Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS method that Jizhong Han etc. propose: small documents is packed, form a large files, the head of large files preserves the index information of all small documents in this file.This large files stores as the file of master-salve distributed file system.Each small documents retrieval, client is query metadata server first, obtains the metadata information of large files, and client is mutual with data server again, reads large files, obtains index information, then read file content below from head.Due to the distributed file system stored towards magnanimity large files, the mode adopting streaming to read, the random efficiency read is low, and postpone large, when multiple client reads multiple small documents simultaneously time, efficiency is very low, very flexible more.

Method three is methods of small documents storage optimization relation technological researching under the HDFS of the announcements such as Beijing University of Post & Telecommunication Jiang Liu: be kept at by small documents in the data block of Data Node, the information such as the metadata record small documents position within the data block of small documents, and be stored on Data Node.The metadata information of all small documents is also kept on the hard disk of meta data server.Each small documents retrieval, client first inquires about on nearest Data Node of accessing whether have this small documents, if not then need accesses meta-data node, and meta data server needs to read the complete metadata information that hard disk obtains accessed small documents, return to client, client is mutual with back end again, obtains small documents.The method Problems existing is, the metadata access efficiency of small documents is low, postpones long.

To sum up, current master-salve distributed file system is all the methods stored based on large files, as being used for storing mass small documents, then and ubiquity ineffective systems, the problem such as small documents search index efficiency is low, system reliability is poor.How on the master-slave mode Distribute file system stored towards magnanimity large files efficiently, reliably to store mass small documents be the technical matters that those skilled in the art pay close attention to.

Summary of the invention

The technical problem to be solved in the present invention is: general master-salve distributed file system can store the file of super large data scale, there is high fault tolerance and enhanced scalability, but be used to store mass small documents and can produce some problems: the master-salve distributed file system of (1) centralized Metadata Service only has single metadata node, quantity of documents determines the scale of metadata, mass small documents can expend the internal memory of metadata node, and its metadata can exhaust the internal memory of metadata node and exceed the limit that computer hardware can reach.(2) recall precision of mass small documents is low, once after file data amount reaches certain scale, the recall precision of file sharply declines, causes system to perform slowly.

Technical scheme of the present invention is:

The first step, disposes mass small documents storage system.Mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI.Index position maintenance module is that each catalogue distributes Data Node (using IP address and port numbers as mark), the mapping relations of catalogue and Data Node are sorted, the Data Node of this small documents index of management is returned, the Data Node namely assigned by small documents catalogue to client.Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node.Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node.These three values upgrading mark are " Y " and " N " two kinds." Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade.Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, can caching record corresponding to a store M catalogue, the Data Node address (the index position mapping item that namely catalogue is corresponding) of management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M needs to arrange voluntarily according to user, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client.Small documents index module needs when starting to create small documents index data structure Index, it sets (see the article " the administering and maintaining of large-scale Ordered indices " of R.Bayer andE.M.McCreight1972 at periodical Acta Informatica) according to directory path B-to carry out the data structure that sorts, its node is the set of small documents index record under catalogue, equally also be a B-tree, be called sequence with small documents.Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents.

Second step, carries out initialization to mass small documents storage system, comprises the following steps:

2.1 initialization index position mapping tables, method is from file dirToDatanode, read the data of index position mapping table.If dirToDatanode file is empty, index position mapping table will be initialized as empty table.Afterwards, index position mapping table is once there be its all data of amendment will again be saved in dirToDatanode file.

2.2 initialization waiting list waitArrangeQueue, method reads queuing data from file waitArrangeDir, if waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue.Afterwards, waitArrangeQueue is once there be its all data of amendment will again be saved in waitArrangeDir file.

2.3 initialization index data structure Index, Index is initialized as an empty B-tree, the demand that Index proposes according to client dynamically reads index data from index file and index journal file.

3rd step, the SmallFileAPI of client operates small documents according to the instruction accepted from keyboard, if create small documents, performs the 4th step, if read small documents to turn the 8th step.

4th step, SmallFileAPI obtains from client and creates the path of small documents, then obtains data file under the catalogue (being called for short small documents catalogue) indicated by small documents path and index position mapping item corresponding to small documents catalogue.Data file is the in esse file of NameSpace of master-salve distributed file system, the data of all small documents under being used for storing same catalogue.Data file is made up of data file head and data record subsequently.Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number (Version) of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores.Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents.Every bar record is made up of record length, key length, key, value four.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.Each small documents, as a record, directly adds at the afterbody of data file when storing small documents.

The relevant information of small documents catalogue whether is comprised in the SmallFileAPI query caching module of 4.1 clients.The relevant information of small documents catalogue comprises the input/output information for this catalogue manipulative indexing position mapping item and small documents data file.If can obtain from cache module, turn the 5th step.If do not find in cache module, perform step 4.2.

The SmallFileAPI of 4.2 clients is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create small documents catalogue, simultaneously the index position maintenance module of metadata node distributes three Data Nodes for this catalogue, is inserted in index position mapping table by the list item of the mapping relations of catalogue and three Data Nodes as mapping table.

Index position maintenance module is the concrete grammar that this catalogue distributes three Data Nodes:

4.2.1, the total data node information safeguarded from metadata node of index position maintenance module (can register at metadata node by master-salve distributed file system data node, so have the information of all Data Nodes in cluster in metadata node) in random obtain three Data Nodes, if successfully obtain three Data Nodes, the renewal mark of these three Data Nodes is initialized as Y, go to step 4.2.3, if do not find three Data Nodes, perform step 4.2.2.

4.2.2, by not joining in queue waitArrangeQueue according to the catalogue of node by allotment, the content of waitArrangeQueue is saved in waitArrangeDir file again.To the signal that client returns operation failure, turn 13 steps, end operation.

4.2.3, index position mapping table is saved in dirToDatanode file by index position maintenance module again.

4.3, make variable X=1;

4.4, if the data file dataX under small documents catalogue does not exist, under the catalogue of small documents, create dataX by metadata node;

4.5, if the data file dataX under small documents catalogue exists, the SmallFileAPI of client is to the output information of the data file dataX under metadata node acquisition request small documents catalogue, if successfully obtain dataX output information (now under small documents catalogue, dataX is taken by other client), perform step 4.6; If unsuccessful acquisition dataX output information, make X increase 1, if X<=P, turn 4.4, if X>P, return error message to client, turn the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by user, general P=32;

The SmallFileAPI of 4.6 clients inquires about Data Node request corresponding to small documents catalogue to the index position maintenance module proposition of metadata node, and this corresponding for this small documents catalogue list item is also returned to client by index position maintenance module search index position mapping table.

The data file output information of index position mapping item corresponding for this small documents catalogue and this small documents is recorded to cache module by the SmallFileAPI of 4.7 clients, if cache module is full, then least recently used by LRU() algorithm eliminates.

5th step, the data of small documents are written in data file by client.Small documents is write in the 4.5 data file dataX obtained as a data record by client, and returns small documents side-play amount in the data file, i.e. data record position in the data file.

6th step, the small documents index module of Data Node creates small documents index.Client sends small documents path, the title of data file storing small documents, small documents side-play amount in the data file and Data Node to the master data node in three Data Nodes of the 4.6 index position mapping items obtained and upgrades and indicate and propose the request of establishment small documents index.If this master data node breaks down, return fail result to client, turn the 13 step, if master data node is normal, then worked as follows by small documents index module after master data node receives request:

6.1 small documents index module judge whether to need to carry out renewal rewards theory to the small documents index under this catalogue according to the renewal mark of master data node, and upgrade zone bit Y and perform 6.2, renewal is masked as N and goes to 6.3.

6.2 small documents index module read path in distributed file system is /index/ small documents directory path .index and/index/ small documents directory path .log two files.Small documents directory path .index file is called index file, all small documents index records under depositing this catalogue, preserves after these index records sort to small documents pathname with B-data tree structure.Small documents directory path .log is called journal file, deposits the certain operations record to small documents index under this catalogue, and comprise establishment, delete index, it is made up of action type and index record.Action type refers to the action operated, as created and deleting.

Small documents index module reads index data according to small documents directory path .index file and small documents directory path .log file, and step is as follows:

6.2.1, small documents index module reads the data of small documents directory path .index, sets as Knots inserting in the index data structure Index of internal memory according to these data genaration B-.

6.2.2, small documents index module reads the operation note of index in small documents directory path .log successively, re-starts operation according to these operation notes.If action type is for creating, then the index information extracting this operation note is inserted in index data structure Index according to B-tree insertion algorithm.

6.2.3, small documents index module rename small documents directory path .index is small documents directory path .index.tmp.Newly-built index file called after small documents directory path .index, and all index records corresponding for this small documents catalogue in Index are saved in newly-built index file, deleting suffix is the index file of .tmp.

6.3 empty small documents directory path .log file content, and are obtained the write operation information of this small documents directory path .log by small documents index module, prepare to carry out writing journalizing.

Small documents pathname, Data Filename, offset information generating indexes record that 6.4 small documents index module will obtain from client, search index data structure Index, index record is inserted into the correspondence position in Index tree by the sequence of small documents pathname, the operation creating index is write in the small documents directory path .log obtained by 6.3.

6.5 small documents index module send small documents to client and create successful signal.

7th step, the cache module of client revises the renewal mark of master data node in caching record corresponding to small documents catalogue.The renewal mark of master data node is revised as N, and the renewal mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y.Turn the 13 step.

4th step is the process of the establishment small documents of mass small documents storage system to the 7th step.

8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, according to small documents directory search client-cache module, if can not find the Data Node information that small documents catalogue is corresponding, perform the 9th step, if found the Data Node information that little civilian catalogue is corresponding, then turn the tenth step.

9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, index position mapping item corresponding for this small documents catalogue is returned to client by index position maintenance module search index position mapping table, and the information of acquisition is recorded in cache module by client.

Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes, if upgrade and be masked as Y, then the index of small documents index module to small documents all under small documents catalogue upgrades, and its concrete steps are as follows:

10.1, small documents index module reads the data of small documents directory path .index, and generates B-tree as Knots inserting in index data structure Index.

10.2, small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes.

10.3, the index record of small documents is returned to client by small documents index module inquiry Index by Data Node.

11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, inquiring client terminal cache module, obtain the input information of data file, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, and be recorded to cache module.

12 step, the SmallFileAPI of client reads the data of small documents from data file according to the index record small file of small documents side-play amount in the data file.

8th step is the process of the reading small documents of mass small documents storage system to the 12 step.

13 step, the SmallFileAPI of client determines whether to still have instruction to input, if having, turns the 3rd step; If nothing, terminate.

The present invention is a kind of mass small documents storage means based on master-salve distributed file system, adopts the present invention can reach following technique effect:

(1) small documents data are stored in master-salve distributed file system by the 5th step by it, realize the distributed storage of data and fault-tolerant, reach Mass storage and the reliability of data.

(2) by the 6th step the index of small documents is distributed to each Data Node to manage and solves the problem of cell data node when storing mass small documents, step 6.2 stores small documents index by distributed file system simultaneously, utilize the fault tolerant mechanism of distributed file system itself to carry out fault-tolerant to small documents index, reduce the danger that small documents index is lost.

(3) on above basis, the information of the mass small documents storage system small documents index position information that user commonly uses at client-cache module buffer memory and data file, avoids with metadata node mutual continually, substantially increases the performance of system.

Experiment shows that the present invention can solve the huge problem of mass small documents storing metadata well, and mass small documents storage system is write efficiency and is greatly improved, the fault-tolerant reliability that ensure that system of small documents index.

Accompanying drawing explanation

The structural drawing of the master-salve distributed file system of Fig. 1 background technology;

The mass small documents storage system overall construction drawing that Fig. 2 first step of the present invention is disposed;

Fig. 3 overview flow chart of the present invention;

Fig. 4 index position mapping table structure of the present invention figure.

The data file structure figure that Fig. 5 the present invention the 4th step creates;

The small documents index record structural drawing that Fig. 6 the present invention 6.4 step small documents index module generates;

Embodiment

Accompanying drawings the specific embodiment of the present invention.

Fig. 1 is the structural drawing of master-salve distributed file system.

Fig. 2 is the overall construction drawing of the mass small documents storage system that the first step of the present invention builds.Mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI.Index position maintenance module is that each catalogue distributes Data Node (using IP address and port numbers as mark), the mapping relations of catalogue and Data Node are sorted, the Data Node of this small documents index of management is returned, the Data Node namely assigned by small documents catalogue to client.Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node.Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node as shown in Figure 4.These three values upgrading mark are " Y " and " N " two kinds." Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade.Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, can caching record corresponding to a store M catalogue, the Data Node address (the index position mapping item that namely catalogue is corresponding) of management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M needs to arrange voluntarily according to user, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client.Small documents index module needs when starting to create small documents index data structure Index, it sets the data structure of carrying out sorting according to directory path B-, its node is the set of small documents index record under catalogue, is equally also a B-tree, is called sequence with small documents.Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents.

Fig. 3 is overview flow chart of the present invention.

The first step, disposes mass small documents storage system.

Second step, the initialization of mass small documents storage system.

3rd step, selects the operation to small documents, if create small documents to turn the 4th step, if read small documents to turn the 8th step.

4th step, the SmallFileAPI of client carrys out the data file of newly-built storage small documents according to the path creating small documents.

5th step, the data of small documents are written in data file by the SmallFileAPI of client.

6th step, small documents index module creates the index of small documents.

7th step, the renewal mark of the master data node of the caching record that client amendment cache module small file catalogue is corresponding.Turn the 13 step.

8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, searches the Data Node information that catalogue is corresponding in cache module, if search less than Data Node information, then perform the 9th step, if found Data Node information, then turn the tenth step.

9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, and by three Data Nodes and upgrade mark and return to client, the information of acquisition is recorded in cache module by client.

Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes.

11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the input information of data file is searched in cache module, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, the data file of small documents input information is recorded to cache module.

12 step, client reads the data of small documents according to the index record small file of small documents side-play amount in the data file from data file.

13 step, the SmallFileAPI of client determines whether that keyboard still has instruction to input, if having, turns the 3rd step; If nothing, terminate.

Fig. 4 is the catalogue of small documents and the structure of back end mapping table.Each catalogue is to having 3 Data Nodes, the particular location of representative data node.Each Data Node renewal mark thereafter represents that small documents index in this Data Node internal memory current is the need of renewal, if be Y, then represent and needs to upgrade, if be N, identify and does not need to upgrade.

Fig. 5 is the data file structure that small documents stores.Data file is made up of data file head and data record subsequently.Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number (Version) of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores.Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents.Every bar record is made up of record length, key length, key, value four.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.

Fig. 6 is the index record structural drawing that 6.4 step small documents index module generate.The path of small documents, Data Filename and small documents side-play amount is in the data file comprised in each index record.

Claims

1., based on a mass small documents storage means for master-salve distributed file system, it is characterized in that comprising the following steps:

The first step, dispose mass small documents storage system, mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system; These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI; Index position maintenance module is that each catalogue distributes Data Node, sorts, return the Data Node of this small documents index of management, the Data Node namely assigned by small documents catalogue to client to the mapping relations of catalogue and Data Node; Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node; Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node; These three values upgrading mark are " Y " and " N " two kinds; " Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade; Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue; The capacity of the cache module of client is initialized as M, the data file input/output information of the index position mapping item that each caching record storage directory is corresponding and small documents, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client; Create small documents index data structure Index when small documents index module starts, it sets the data structure of carrying out sorting according to directory path B-, and its node is the set of small documents index record under catalogue, is equally also a B-tree, is called sequence with small documents; Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents;

2.1 initialization index position mapping tables, method is from file dirToDatanode, read the data of index position mapping table, if dirToDatanode file is empty, index position mapping table will be initialized as empty table;

2.2 initialization waiting list waitArrangeQueue, method reads queuing data from file waitArrangeDir, if waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue;

2.3 initialization index data structure Index, are initialized as an empty B-tree by Index;

3rd step, the SmallFileAPI of client operates small documents according to the instruction accepted from keyboard, if create small documents, performs the 4th step, if read small documents to turn the 8th step;

4th step, SmallFileAPI obtains the path creating small documents from client, then obtain index position mapping item corresponding to the catalogue indicated by small documents path and the data file under small documents catalogue and small documents catalogue, method is:

The relevant information of small documents catalogue whether is comprised in the SmallFileAPI query caching module of 4.1 clients, the relevant information of small documents catalogue comprises the input/output information for this catalogue manipulative indexing position mapping item and small documents data file, if can obtain from cache module, turn the 5th step; If do not find in cache module, perform step 4.2;

The SmallFileAPI of 4.2 clients is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create small documents catalogue, simultaneously the index position maintenance module of metadata node distributes three Data Nodes for this catalogue, is inserted in index position mapping table using the mapping relations of catalogue and three Data Nodes as the list item of mapping table; Index position maintenance module is the concrete grammar that this catalogue distributes three Data Nodes:

4.2.1 index position maintenance module obtains three Data Nodes at random from the total data node information that metadata node is safeguarded, if successfully obtain three Data Nodes, the renewal mark of these three Data Nodes is initialized as Y, go to step 4.2.3, if do not find three Data Nodes, perform step 4.2.2;

4.2.2 by not joining in queue waitArrangeQueue according to the catalogue of node by allotment, the content of waitArrangeQueue is saved in waitArrangeDir file, to the signal that client returns operation failure, turns 13 steps, end operation;

4.2.3 index position mapping table is saved in dirToDatanode file by index position maintenance module;

4.3 make variable X=1;

If the data file dataX under 4.4 small documents catalogues does not exist, under the catalogue of small documents, create dataX by metadata node;

If the data file dataX under 4.5 small documents catalogues exists, the SmallFileAPI of client, to the output information of the data file dataX under metadata node acquisition request small documents catalogue, if successfully obtain dataX output information, performs step 4.6; If unsuccessful acquisition dataX output information, make X increase 1, if X<=P, turn 4.4, if X>P, return error message to client, turn the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by user;

The SmallFileAPI of 4.6 clients inquires about Data Node request corresponding to small documents catalogue to the index position maintenance module proposition of metadata node, and this corresponding for this small documents catalogue list item is also returned to client by index position maintenance module search index position mapping table;

The data file output information of index position mapping item corresponding for this small documents catalogue and this small documents is recorded to cache module by the SmallFileAPI of 4.7 clients, if cache module is full, then eliminated by lru algorithm and least recently used algorithm;

5th step, small documents is write in the 4.5 data file dataX obtained as a data record by client, and returns small documents side-play amount in the data file, i.e. data record position in the data file;

6th step, the small documents index module of Data Node creates small documents index; Client sends small documents path, the title of data file storing small documents, small documents side-play amount in the data file and Data Node to the master data node in three Data Nodes of the 4.6 index position mapping items obtained and upgrades and indicate and propose the request of establishment small documents index, if this master data node breaks down, fail result is returned to client, turn the 13 step, if master data node is normal, then worked as follows by small documents index module after master data node receives request:

6.1 small documents index module judge whether to need to carry out renewal rewards theory to the small documents index under this catalogue according to the renewal mark of master data node, and upgrade zone bit Y and perform 6.2, renewal is masked as N and goes to 6.3;

6.2 small documents index module read path in distributed file system is /index/ small documents directory path .index and/index/ small documents directory path .log two files, small documents directory path .index file is called index file, all small documents index records under depositing this catalogue, preserve after these index records sort to small documents pathname with B-data tree structure; Small documents directory path .log is called journal file, deposits the operation note to small documents index under this catalogue, is made up of action type and index record; Read index data according to small documents directory path .index file and small documents directory path .log, step is as follows:

6.2.1 small documents index module reads the data of small documents directory path .index, sets as Knots inserting in the index data structure Index of internal memory according to these data genaration B-;

6.2.2 small documents index module reads the operation note of index in small documents directory path .log successively, re-starts operation according to these operation notes;

6.2.3 small documents index module rename small documents directory path .index is small documents directory path .index.tmp, newly-built index file called after small documents directory path .index, and all index records corresponding for this small documents catalogue in Index are saved in newly-built index file, deleting suffix is the index file of .tmp;

6.3 empty small documents directory path .log file content, and are obtained the write operation information of this small documents directory path .log by small documents index module, prepare to carry out writing journalizing;

Small documents pathname, Data Filename, offset information generating indexes record that 6.4 small documents index module will obtain from client, search index data structure Index, index record is inserted into the correspondence position in Index tree by the sequence of small documents pathname, the operation creating index is write in the small documents directory path .log obtained by 6.3;

6.5 small documents index module send small documents to client and create successful signal;

7th step, the cache module of client revises the renewal mark of master data node in caching record corresponding to small documents catalogue, the renewal mark of master data node is revised as N, and the renewal mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y, turns the 13 step;

8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, according to small documents directory search client-cache module, if can not find the Data Node information that small documents catalogue is corresponding, perform the 9th step, if found the Data Node information that little civilian catalogue is corresponding, then turn the tenth step;

9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, index position mapping item corresponding for this small documents catalogue is returned to client by index position maintenance module search index position mapping table, and the information of acquisition is recorded in cache module by client;

Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes, if upgrade and be masked as Y, then the index of small documents index module to small documents all under small documents catalogue upgrades, and concrete steps are as follows:

10.1 small documents index module read the data of small documents directory path .index, and generate B-tree as Knots inserting in index data structure Index;

10.2 small documents index module read the index operation record of small documents directory path .log successively, re-start operation according to these operation notes;

The index record of small documents is returned to client by small documents index module inquiry Index by 10.3 Data Nodes;

11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, inquiring client terminal cache module, obtain the input information of data file, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, and be recorded to cache module;

12 step, the SmallFileAPI of client reads the data of small documents from data file according to the index record small file of small documents side-play amount in the data file;

2. as claimed in claim 1 based on the mass small documents storage means of master-salve distributed file system, it is characterized in that described data file is the in esse file of NameSpace of master-salve distributed file system, the data of all small documents under being used for storing same catalogue; Data file is made up of data file head and data record subsequently; Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores; Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents, and every bar record is made up of record length, key length, key, value four; Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents; Each small documents, as a record, directly adds at the afterbody of data file when storing small documents.

3., as claimed in claim 1 based on the mass small documents storage means of master-salve distributed file system, it is characterized in that described P=32.