CN103020315B - A kind of mass small documents storage means based on master-salve distributed file system - Google Patents

A kind of mass small documents storage means based on master-salve distributed file system Download PDF

Info

Publication number
CN103020315B
CN103020315B CN201310009182.4A CN201310009182A CN103020315B CN 103020315 B CN103020315 B CN 103020315B CN 201310009182 A CN201310009182 A CN 201310009182A CN 103020315 B CN103020315 B CN 103020315B
Authority
CN
China
Prior art keywords
small documents
index
data
file
catalogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310009182.4A
Other languages
Chinese (zh)
Other versions
CN103020315A (en
Inventor
王蕾
何连跃
徐叶
李姗姗
戴华东
吴庆波
丁滟
黄辰林
付松龄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201310009182.4A priority Critical patent/CN103020315B/en
Publication of CN103020315A publication Critical patent/CN103020315A/en
Application granted granted Critical
Publication of CN103020315B publication Critical patent/CN103020315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of mass small documents storage means based on master-salve distributed file system, object solves the problem that master-salve distributed file system stores mass small documents generation.Technical scheme first to be disposed and initialization mass small documents storage system, and then the SmallFileAPI of client to create small documents according to the instruction accepted from keyboard or reads.During system creation small documents, SmallFileAPI according to the data file of the newly-built small documents in small documents path obtained from client, and writes small documents data, creates small documents index on Data Node simultaneously; When system reads small documents, obtain Data Node information corresponding to its parent directory according to small documents path, and arbitrary Data Node sends index request wherein, finally from data file, reads small documents data according to index information.Adopt the present invention can solve the huge problem of mass small documents storing metadata, improve mass small documents storage system and write efficiency, and the reliability of system can be ensured.

Description

A kind of mass small documents storage means based on master-salve distributed file system
Technical field
The present invention relates to the storage means of mass small documents in the master-salve distributed file system stored towards magnanimity large files.
Background technology
Along with the development of novel computing technique, be that enterprise or the data of individual all start to increase rapidly.What mass data growth brought is not only memory capacity problem, returns data management, memory property brings challenge, becomes the key problem that the cloud computing epoch need solution.In order to ensure the High Availabitity of data, highly reliable and economy, cloud computing adopts the mode of distributed storage to store data, adopts the mode of redundant storage to ensure the reliability of data.In order to meet the demand of a large number of users, the memory technology of cloud computing must have high-throughput and high transmission rates.For the data storage problem of cloud computing, industry member and academia propose multiple solution, this to increase income file system HDFS, the NoSQL storage system Dynamo stored towards semi-structured and structural data, Cassandra, MongoDB etc. comprising Google file system GFS, Hadoop.
Early stage in cloud computing, the design of storage system is mainly towards efficient storage and the access of magnanimity large files, more weak to the support of small documents, but along with the development of personal terminal and mobile Internet, the ratio of small documents shared by cloud storage system is more and more higher, and efficient storage and the access of mass small documents become the problem needing urgent solution.Small documents refers to size from a few KB to the file of tens KB.Such as, Taobao, need the commodity picture storing magnanimity, these pictures are all small documents; The search engine such as Google, Baidu needs to capture into ten thousand more than one hundred million webpages from network, and these webpages are all small documents.The small documents of trillion quantity sizes constitutes mass small documents, if mass small documents can not efficient storage, will cause in the face of mass small documents application does not realize or can not meet the requirement of client.The present invention mainly solves the efficient storage problem of mass small documents.
The architecture of master-salve distributed file system as shown in Figure 1.This type of distributed file system is made up of a centralized meta data server (also referred to as metadata node) and multiple distributed data server (also referred to as Data Node).The metadata of meta data server managing file system, comprises the bibliographic structure of file system, the content such as memory location, size, each attribute of each file.The data of data server memory file system, i.e. file itself.During the master-salve distributed file system of client-access, first accesses meta-data server, obtains the metadata information of file, and then according to these information, access stores the data server of respective file, obtains file.It is simple, manageable that the advantage of this type of file system is that design realizes, and realizes the design of high fault-tolerant, highly reliable, high-throughput by simple technology.Shortcoming is, if adopt single meta data server, then it can become the performance bottleneck of system access, and easily occurs single point failure.If employing metadata server cluster, then metadata management can be caused complicated, and reduce the efficiency of metadata access.
The Typical Representative of this type of distributed file system is Google file system GFS, Hadoop file system HDFS, Luster, PVFS etc.Wherein HDFS increases income, and it can operate on general hardware platform.The cluster running HDFS is made up of a metadata node and multiple Data Node.HDFS is the system of an Error Tolerance, is applicable to being deployed on cheap computing machine, provides the data access of high-throughput, is applicable to very much the application on large-scale data set, and supports the data of streaming file reading system.
At present, the storage mass small documents of master-salve distributed file system mainly contains following several method:
Method one is Hadoop archive file, is called for short HAR(Hadoop Archive).Store to solve in HDFS the problem that mass small documents can exhaust the internal memory of metadata node, Hadoop proposes the method for HAR archive file.Efficiently file can be put into HDFS block by HAR, while minimizing metadata node internal memory uses, still allow to carry out transparent access to file.Multiple small documents is packaged into a HAR file by the method.But HAR file comes with some shortcomings: first, HAR file, once establishment just can not be revised, increase or delete small documents, must create new HAR file, and secondly, after creating HAR file, small documents originally can not be deleted automatically, needs manually to do deletion.Therefore HAR file unusual poor efficiency when processing mass small documents.
Method two is Chinese Academy of Sciences Xuhui Liu, the Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS method that Jizhong Han etc. propose: small documents is packed, form a large files, the head of large files preserves the index information of all small documents in this file.This large files stores as the file of master-salve distributed file system.Each small documents retrieval, client is query metadata server first, obtains the metadata information of large files, and client is mutual with data server again, reads large files, obtains index information, then read file content below from head.Due to the distributed file system stored towards magnanimity large files, the mode adopting streaming to read, the random efficiency read is low, and postpone large, when multiple client reads multiple small documents simultaneously time, efficiency is very low, very flexible more.
Method three is methods of small documents storage optimization relation technological researching under the HDFS of the announcements such as Beijing University of Post & Telecommunication Jiang Liu: be kept at by small documents in the data block of Data Node, the information such as the metadata record small documents position within the data block of small documents, and be stored on Data Node.The metadata information of all small documents is also kept on the hard disk of meta data server.Each small documents retrieval, client first inquires about on nearest Data Node of accessing whether have this small documents, if not then need accesses meta-data node, and meta data server needs to read the complete metadata information that hard disk obtains accessed small documents, return to client, client is mutual with back end again, obtains small documents.The method Problems existing is, the metadata access efficiency of small documents is low, postpones long.
To sum up, current master-salve distributed file system is all the methods stored based on large files, as being used for storing mass small documents, then and ubiquity ineffective systems, the problem such as small documents search index efficiency is low, system reliability is poor.How on the master-slave mode Distribute file system stored towards magnanimity large files efficiently, reliably to store mass small documents be the technical matters that those skilled in the art pay close attention to.
Summary of the invention
The technical problem to be solved in the present invention is: general master-salve distributed file system can store the file of super large data scale, there is high fault tolerance and enhanced scalability, but be used to store mass small documents and can produce some problems: the master-salve distributed file system of (1) centralized Metadata Service only has single metadata node, quantity of documents determines the scale of metadata, mass small documents can expend the internal memory of metadata node, and its metadata can exhaust the internal memory of metadata node and exceed the limit that computer hardware can reach.(2) recall precision of mass small documents is low, once after file data amount reaches certain scale, the recall precision of file sharply declines, causes system to perform slowly.
Technical scheme of the present invention is:
The first step, disposes mass small documents storage system.Mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI.Index position maintenance module is that each catalogue distributes Data Node (using IP address and port numbers as mark), the mapping relations of catalogue and Data Node are sorted, the Data Node of this small documents index of management is returned, the Data Node namely assigned by small documents catalogue to client.Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node.Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node.These three values upgrading mark are " Y " and " N " two kinds." Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade.Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, can caching record corresponding to a store M catalogue, the Data Node address (the index position mapping item that namely catalogue is corresponding) of management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M needs to arrange voluntarily according to user, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client.Small documents index module needs when starting to create small documents index data structure Index, it sets (see the article " the administering and maintaining of large-scale Ordered indices " of R.Bayer andE.M.McCreight1972 at periodical Acta Informatica) according to directory path B-to carry out the data structure that sorts, its node is the set of small documents index record under catalogue, equally also be a B-tree, be called sequence with small documents.Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents.
Second step, carries out initialization to mass small documents storage system, comprises the following steps:
2.1 initialization index position mapping tables, method is from file dirToDatanode, read the data of index position mapping table.If dirToDatanode file is empty, index position mapping table will be initialized as empty table.Afterwards, index position mapping table is once there be its all data of amendment will again be saved in dirToDatanode file.
2.2 initialization waiting list waitArrangeQueue, method reads queuing data from file waitArrangeDir, if waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue.Afterwards, waitArrangeQueue is once there be its all data of amendment will again be saved in waitArrangeDir file.
2.3 initialization index data structure Index, Index is initialized as an empty B-tree, the demand that Index proposes according to client dynamically reads index data from index file and index journal file.
3rd step, the SmallFileAPI of client operates small documents according to the instruction accepted from keyboard, if create small documents, performs the 4th step, if read small documents to turn the 8th step.
4th step, SmallFileAPI obtains from client and creates the path of small documents, then obtains data file under the catalogue (being called for short small documents catalogue) indicated by small documents path and index position mapping item corresponding to small documents catalogue.Data file is the in esse file of NameSpace of master-salve distributed file system, the data of all small documents under being used for storing same catalogue.Data file is made up of data file head and data record subsequently.Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number (Version) of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores.Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents.Every bar record is made up of record length, key length, key, value four.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.Each small documents, as a record, directly adds at the afterbody of data file when storing small documents.
The relevant information of small documents catalogue whether is comprised in the SmallFileAPI query caching module of 4.1 clients.The relevant information of small documents catalogue comprises the input/output information for this catalogue manipulative indexing position mapping item and small documents data file.If can obtain from cache module, turn the 5th step.If do not find in cache module, perform step 4.2.
The SmallFileAPI of 4.2 clients is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create small documents catalogue, simultaneously the index position maintenance module of metadata node distributes three Data Nodes for this catalogue, is inserted in index position mapping table by the list item of the mapping relations of catalogue and three Data Nodes as mapping table.
Index position maintenance module is the concrete grammar that this catalogue distributes three Data Nodes:
4.2.1, the total data node information safeguarded from metadata node of index position maintenance module (can register at metadata node by master-salve distributed file system data node, so have the information of all Data Nodes in cluster in metadata node) in random obtain three Data Nodes, if successfully obtain three Data Nodes, the renewal mark of these three Data Nodes is initialized as Y, go to step 4.2.3, if do not find three Data Nodes, perform step 4.2.2.
4.2.2, by not joining in queue waitArrangeQueue according to the catalogue of node by allotment, the content of waitArrangeQueue is saved in waitArrangeDir file again.To the signal that client returns operation failure, turn 13 steps, end operation.
4.2.3, index position mapping table is saved in dirToDatanode file by index position maintenance module again.
4.3, make variable X=1;
4.4, if the data file dataX under small documents catalogue does not exist, under the catalogue of small documents, create dataX by metadata node;
4.5, if the data file dataX under small documents catalogue exists, the SmallFileAPI of client is to the output information of the data file dataX under metadata node acquisition request small documents catalogue, if successfully obtain dataX output information (now under small documents catalogue, dataX is taken by other client), perform step 4.6; If unsuccessful acquisition dataX output information, make X increase 1, if X<=P, turn 4.4, if X>P, return error message to client, turn the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by user, general P=32;
The SmallFileAPI of 4.6 clients inquires about Data Node request corresponding to small documents catalogue to the index position maintenance module proposition of metadata node, and this corresponding for this small documents catalogue list item is also returned to client by index position maintenance module search index position mapping table.
The data file output information of index position mapping item corresponding for this small documents catalogue and this small documents is recorded to cache module by the SmallFileAPI of 4.7 clients, if cache module is full, then least recently used by LRU() algorithm eliminates.
5th step, the data of small documents are written in data file by client.Small documents is write in the 4.5 data file dataX obtained as a data record by client, and returns small documents side-play amount in the data file, i.e. data record position in the data file.
6th step, the small documents index module of Data Node creates small documents index.Client sends small documents path, the title of data file storing small documents, small documents side-play amount in the data file and Data Node to the master data node in three Data Nodes of the 4.6 index position mapping items obtained and upgrades and indicate and propose the request of establishment small documents index.If this master data node breaks down, return fail result to client, turn the 13 step, if master data node is normal, then worked as follows by small documents index module after master data node receives request:
6.1 small documents index module judge whether to need to carry out renewal rewards theory to the small documents index under this catalogue according to the renewal mark of master data node, and upgrade zone bit Y and perform 6.2, renewal is masked as N and goes to 6.3.
6.2 small documents index module read path in distributed file system is /index/ small documents directory path .index and/index/ small documents directory path .log two files.Small documents directory path .index file is called index file, all small documents index records under depositing this catalogue, preserves after these index records sort to small documents pathname with B-data tree structure.Small documents directory path .log is called journal file, deposits the certain operations record to small documents index under this catalogue, and comprise establishment, delete index, it is made up of action type and index record.Action type refers to the action operated, as created and deleting.
Small documents index module reads index data according to small documents directory path .index file and small documents directory path .log file, and step is as follows:
6.2.1, small documents index module reads the data of small documents directory path .index, sets as Knots inserting in the index data structure Index of internal memory according to these data genaration B-.
6.2.2, small documents index module reads the operation note of index in small documents directory path .log successively, re-starts operation according to these operation notes.If action type is for creating, then the index information extracting this operation note is inserted in index data structure Index according to B-tree insertion algorithm.
6.2.3, small documents index module rename small documents directory path .index is small documents directory path .index.tmp.Newly-built index file called after small documents directory path .index, and all index records corresponding for this small documents catalogue in Index are saved in newly-built index file, deleting suffix is the index file of .tmp.
6.3 empty small documents directory path .log file content, and are obtained the write operation information of this small documents directory path .log by small documents index module, prepare to carry out writing journalizing.
Small documents pathname, Data Filename, offset information generating indexes record that 6.4 small documents index module will obtain from client, search index data structure Index, index record is inserted into the correspondence position in Index tree by the sequence of small documents pathname, the operation creating index is write in the small documents directory path .log obtained by 6.3.
6.5 small documents index module send small documents to client and create successful signal.
7th step, the cache module of client revises the renewal mark of master data node in caching record corresponding to small documents catalogue.The renewal mark of master data node is revised as N, and the renewal mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y.Turn the 13 step.
4th step is the process of the establishment small documents of mass small documents storage system to the 7th step.
8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, according to small documents directory search client-cache module, if can not find the Data Node information that small documents catalogue is corresponding, perform the 9th step, if found the Data Node information that little civilian catalogue is corresponding, then turn the tenth step.
9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, index position mapping item corresponding for this small documents catalogue is returned to client by index position maintenance module search index position mapping table, and the information of acquisition is recorded in cache module by client.
Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes, if upgrade and be masked as Y, then the index of small documents index module to small documents all under small documents catalogue upgrades, and its concrete steps are as follows:
10.1, small documents index module reads the data of small documents directory path .index, and generates B-tree as Knots inserting in index data structure Index.
10.2, small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes.
10.3, the index record of small documents is returned to client by small documents index module inquiry Index by Data Node.
11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, inquiring client terminal cache module, obtain the input information of data file, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, and be recorded to cache module.
12 step, the SmallFileAPI of client reads the data of small documents from data file according to the index record small file of small documents side-play amount in the data file.
8th step is the process of the reading small documents of mass small documents storage system to the 12 step.
13 step, the SmallFileAPI of client determines whether to still have instruction to input, if having, turns the 3rd step; If nothing, terminate.
The present invention is a kind of mass small documents storage means based on master-salve distributed file system, adopts the present invention can reach following technique effect:
(1) small documents data are stored in master-salve distributed file system by the 5th step by it, realize the distributed storage of data and fault-tolerant, reach Mass storage and the reliability of data.
(2) by the 6th step the index of small documents is distributed to each Data Node to manage and solves the problem of cell data node when storing mass small documents, step 6.2 stores small documents index by distributed file system simultaneously, utilize the fault tolerant mechanism of distributed file system itself to carry out fault-tolerant to small documents index, reduce the danger that small documents index is lost.
(3) on above basis, the information of the mass small documents storage system small documents index position information that user commonly uses at client-cache module buffer memory and data file, avoids with metadata node mutual continually, substantially increases the performance of system.
Experiment shows that the present invention can solve the huge problem of mass small documents storing metadata well, and mass small documents storage system is write efficiency and is greatly improved, the fault-tolerant reliability that ensure that system of small documents index.
Accompanying drawing explanation
The structural drawing of the master-salve distributed file system of Fig. 1 background technology;
The mass small documents storage system overall construction drawing that Fig. 2 first step of the present invention is disposed;
Fig. 3 overview flow chart of the present invention;
Fig. 4 index position mapping table structure of the present invention figure.
The data file structure figure that Fig. 5 the present invention the 4th step creates;
The small documents index record structural drawing that Fig. 6 the present invention 6.4 step small documents index module generates;
Embodiment
Accompanying drawings the specific embodiment of the present invention.
Fig. 1 is the structural drawing of master-salve distributed file system.
Fig. 2 is the overall construction drawing of the mass small documents storage system that the first step of the present invention builds.Mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI.Index position maintenance module is that each catalogue distributes Data Node (using IP address and port numbers as mark), the mapping relations of catalogue and Data Node are sorted, the Data Node of this small documents index of management is returned, the Data Node namely assigned by small documents catalogue to client.Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node.Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node as shown in Figure 4.These three values upgrading mark are " Y " and " N " two kinds." Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade.Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, can caching record corresponding to a store M catalogue, the Data Node address (the index position mapping item that namely catalogue is corresponding) of management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M needs to arrange voluntarily according to user, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client.Small documents index module needs when starting to create small documents index data structure Index, it sets the data structure of carrying out sorting according to directory path B-, its node is the set of small documents index record under catalogue, is equally also a B-tree, is called sequence with small documents.Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents.
Fig. 3 is overview flow chart of the present invention.
The first step, disposes mass small documents storage system.
Second step, the initialization of mass small documents storage system.
3rd step, selects the operation to small documents, if create small documents to turn the 4th step, if read small documents to turn the 8th step.
4th step, the SmallFileAPI of client carrys out the data file of newly-built storage small documents according to the path creating small documents.
5th step, the data of small documents are written in data file by the SmallFileAPI of client.
6th step, small documents index module creates the index of small documents.
7th step, the renewal mark of the master data node of the caching record that client amendment cache module small file catalogue is corresponding.Turn the 13 step.
8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, searches the Data Node information that catalogue is corresponding in cache module, if search less than Data Node information, then perform the 9th step, if found Data Node information, then turn the tenth step.
9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, and by three Data Nodes and upgrade mark and return to client, the information of acquisition is recorded in cache module by client.
Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes.
11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the input information of data file is searched in cache module, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, the data file of small documents input information is recorded to cache module.
12 step, client reads the data of small documents according to the index record small file of small documents side-play amount in the data file from data file.
13 step, the SmallFileAPI of client determines whether that keyboard still has instruction to input, if having, turns the 3rd step; If nothing, terminate.
Fig. 4 is the catalogue of small documents and the structure of back end mapping table.Each catalogue is to having 3 Data Nodes, the particular location of representative data node.Each Data Node renewal mark thereafter represents that small documents index in this Data Node internal memory current is the need of renewal, if be Y, then represent and needs to upgrade, if be N, identify and does not need to upgrade.
Fig. 5 is the data file structure that small documents stores.Data file is made up of data file head and data record subsequently.Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number (Version) of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores.Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents.Every bar record is made up of record length, key length, key, value four.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.
Fig. 6 is the index record structural drawing that 6.4 step small documents index module generate.The path of small documents, Data Filename and small documents side-play amount is in the data file comprised in each index record.

Claims (3)

1., based on a mass small documents storage means for master-salve distributed file system, it is characterized in that comprising the following steps:
The first step, dispose mass small documents storage system, mass small documents storage system is made up of the software for the treatment of mass small documents in master-salve distributed file system and each node of master-salve distributed file system; These softwares comprise the index position maintenance module on metadata node, the small documents index module on Data Node, client-cache module and client operation small documents special purpose interface SmallFileAPI; Index position maintenance module is that each catalogue distributes Data Node, sorts, return the Data Node of this small documents index of management, the Data Node namely assigned by small documents catalogue to client to the mapping relations of catalogue and Data Node; Index position maintenance module adopts index position mapping table to save contents and the mapping relations of Data Node; Index position mapping table is upgraded mark seven formed by catalogue, master data node, master data node renewal mark, first authentic copy Data Node, first authentic copy Data Node renewal mark, triplicate Data Node, triplicate Data Node; These three values upgrading mark are " Y " and " N " two kinds; " Y " represents that the small documents index on Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " represents is not up-to-date, needs to upgrade; Index position maintenance module creates queue waitArrangeQueue, its entries in queues is directory path, be used for recording the catalogue successfully can not distributing Data Node, fashionable to wait for that distributed file system has new Data Node to add, index position maintenance module is redistributed the catalogue in waitArrangeQueue; Index position maintenance module also creates two empty files on disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue; The capacity of the cache module of client is initialized as M, the data file input/output information of the index position mapping item that each caching record storage directory is corresponding and small documents, and M is positive integer; Whether successfully small documents index module receives the request of client to index creation or inquiry, judges whether to need to load small documents index data according to upgrading mark in index position mapping table, by small documents index or create result and return to client; Create small documents index data structure Index when small documents index module starts, it sets the data structure of carrying out sorting according to directory path B-, and its node is the set of small documents index record under catalogue, is equally also a B-tree, is called sequence with small documents; Small documents index record represents the index of a small documents, comprises the pathname of small documents, the data file path of small documents and small documents side-play amount in the data file; SmallFileAPI has been the mutual software of client and mass small documents memory system data, comprises the operation creating and read small documents;
Second step, carries out initialization to mass small documents storage system, comprises the following steps:
2.1 initialization index position mapping tables, method is from file dirToDatanode, read the data of index position mapping table, if dirToDatanode file is empty, index position mapping table will be initialized as empty table;
2.2 initialization waiting list waitArrangeQueue, method reads queuing data from file waitArrangeDir, if waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue;
2.3 initialization index data structure Index, are initialized as an empty B-tree by Index;
3rd step, the SmallFileAPI of client operates small documents according to the instruction accepted from keyboard, if create small documents, performs the 4th step, if read small documents to turn the 8th step;
4th step, SmallFileAPI obtains the path creating small documents from client, then obtain index position mapping item corresponding to the catalogue indicated by small documents path and the data file under small documents catalogue and small documents catalogue, method is:
The relevant information of small documents catalogue whether is comprised in the SmallFileAPI query caching module of 4.1 clients, the relevant information of small documents catalogue comprises the input/output information for this catalogue manipulative indexing position mapping item and small documents data file, if can obtain from cache module, turn the 5th step; If do not find in cache module, perform step 4.2;
The SmallFileAPI of 4.2 clients is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create small documents catalogue, simultaneously the index position maintenance module of metadata node distributes three Data Nodes for this catalogue, is inserted in index position mapping table using the mapping relations of catalogue and three Data Nodes as the list item of mapping table; Index position maintenance module is the concrete grammar that this catalogue distributes three Data Nodes:
4.2.1 index position maintenance module obtains three Data Nodes at random from the total data node information that metadata node is safeguarded, if successfully obtain three Data Nodes, the renewal mark of these three Data Nodes is initialized as Y, go to step 4.2.3, if do not find three Data Nodes, perform step 4.2.2;
4.2.2 by not joining in queue waitArrangeQueue according to the catalogue of node by allotment, the content of waitArrangeQueue is saved in waitArrangeDir file, to the signal that client returns operation failure, turns 13 steps, end operation;
4.2.3 index position mapping table is saved in dirToDatanode file by index position maintenance module;
4.3 make variable X=1;
If the data file dataX under 4.4 small documents catalogues does not exist, under the catalogue of small documents, create dataX by metadata node;
If the data file dataX under 4.5 small documents catalogues exists, the SmallFileAPI of client, to the output information of the data file dataX under metadata node acquisition request small documents catalogue, if successfully obtain dataX output information, performs step 4.6; If unsuccessful acquisition dataX output information, make X increase 1, if X<=P, turn 4.4, if X>P, return error message to client, turn the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by user;
The SmallFileAPI of 4.6 clients inquires about Data Node request corresponding to small documents catalogue to the index position maintenance module proposition of metadata node, and this corresponding for this small documents catalogue list item is also returned to client by index position maintenance module search index position mapping table;
The data file output information of index position mapping item corresponding for this small documents catalogue and this small documents is recorded to cache module by the SmallFileAPI of 4.7 clients, if cache module is full, then eliminated by lru algorithm and least recently used algorithm;
5th step, small documents is write in the 4.5 data file dataX obtained as a data record by client, and returns small documents side-play amount in the data file, i.e. data record position in the data file;
6th step, the small documents index module of Data Node creates small documents index; Client sends small documents path, the title of data file storing small documents, small documents side-play amount in the data file and Data Node to the master data node in three Data Nodes of the 4.6 index position mapping items obtained and upgrades and indicate and propose the request of establishment small documents index, if this master data node breaks down, fail result is returned to client, turn the 13 step, if master data node is normal, then worked as follows by small documents index module after master data node receives request:
6.1 small documents index module judge whether to need to carry out renewal rewards theory to the small documents index under this catalogue according to the renewal mark of master data node, and upgrade zone bit Y and perform 6.2, renewal is masked as N and goes to 6.3;
6.2 small documents index module read path in distributed file system is /index/ small documents directory path .index and/index/ small documents directory path .log two files, small documents directory path .index file is called index file, all small documents index records under depositing this catalogue, preserve after these index records sort to small documents pathname with B-data tree structure; Small documents directory path .log is called journal file, deposits the operation note to small documents index under this catalogue, is made up of action type and index record; Read index data according to small documents directory path .index file and small documents directory path .log, step is as follows:
6.2.1 small documents index module reads the data of small documents directory path .index, sets as Knots inserting in the index data structure Index of internal memory according to these data genaration B-;
6.2.2 small documents index module reads the operation note of index in small documents directory path .log successively, re-starts operation according to these operation notes;
6.2.3 small documents index module rename small documents directory path .index is small documents directory path .index.tmp, newly-built index file called after small documents directory path .index, and all index records corresponding for this small documents catalogue in Index are saved in newly-built index file, deleting suffix is the index file of .tmp;
6.3 empty small documents directory path .log file content, and are obtained the write operation information of this small documents directory path .log by small documents index module, prepare to carry out writing journalizing;
Small documents pathname, Data Filename, offset information generating indexes record that 6.4 small documents index module will obtain from client, search index data structure Index, index record is inserted into the correspondence position in Index tree by the sequence of small documents pathname, the operation creating index is write in the small documents directory path .log obtained by 6.3;
6.5 small documents index module send small documents to client and create successful signal;
7th step, the cache module of client revises the renewal mark of master data node in caching record corresponding to small documents catalogue, the renewal mark of master data node is revised as N, and the renewal mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y, turns the 13 step;
8th step, the SmallFileAPI of client obtains small documents catalogue according to small documents path, according to small documents directory search client-cache module, if can not find the Data Node information that small documents catalogue is corresponding, perform the 9th step, if found the Data Node information that little civilian catalogue is corresponding, then turn the tenth step;
9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, index position mapping item corresponding for this small documents catalogue is returned to client by index position maintenance module search index position mapping table, and the information of acquisition is recorded in cache module by client;
Tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three Data Nodes, if upgrade and be masked as Y, then the index of small documents index module to small documents all under small documents catalogue upgrades, and concrete steps are as follows:
10.1 small documents index module read the data of small documents directory path .index, and generate B-tree as Knots inserting in index data structure Index;
10.2 small documents index module read the index operation record of small documents directory path .log successively, re-start operation according to these operation notes;
The index record of small documents is returned to client by small documents index module inquiry Index by 10.3 Data Nodes;
11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, inquiring client terminal cache module, obtain the input information of data file, if not, utilize Distribute file system to read the data file input information of file interface acquisition small documents, and be recorded to cache module;
12 step, the SmallFileAPI of client reads the data of small documents from data file according to the index record small file of small documents side-play amount in the data file;
13 step, the SmallFileAPI of client determines whether to still have instruction to input, if having, turns the 3rd step; If nothing, terminate.
2. as claimed in claim 1 based on the mass small documents storage means of master-salve distributed file system, it is characterized in that described data file is the in esse file of NameSpace of master-salve distributed file system, the data of all small documents under being used for storing same catalogue; Data file is made up of data file head and data record subsequently; Data file head is made up of four fields, and the first field accounts for three bytes, description document type, in order to data file and other common files to be distinguished; Second field accounts for a byte, represents the version number of data file; 3rd field represents key type, illustrates that key by which kind of data type stores; The type of the 4th field list indicating value, explanation value is which kind of data type stores; Data file head is afterwards immediately following one or more record, and each record stores the complete data of a small documents, and every bar record is made up of record length, key length, key, value four; Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents; Each small documents, as a record, directly adds at the afterbody of data file when storing small documents.
3., as claimed in claim 1 based on the mass small documents storage means of master-salve distributed file system, it is characterized in that described P=32.
CN201310009182.4A 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system Active CN103020315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310009182.4A CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310009182.4A CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Publications (2)

Publication Number Publication Date
CN103020315A CN103020315A (en) 2013-04-03
CN103020315B true CN103020315B (en) 2015-08-19

Family

ID=47968918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310009182.4A Active CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Country Status (1)

Country Link
CN (1) CN103020315B (en)

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353901B (en) * 2013-08-01 2016-10-05 百度在线网络技术(北京)有限公司 The orderly management method of table data based on Hadoop distributed file system and system
CN103366016B (en) * 2013-08-01 2017-06-09 南京大学 E-file based on HDFS is centrally stored and optimization method
CN103731369A (en) * 2013-12-27 2014-04-16 乐视网信息技术(北京)股份有限公司 Method and system for updating data queue in server
CN105279166B (en) * 2014-06-20 2019-01-25 中国电信股份有限公司 File management method and system
CN104123359B (en) * 2014-07-17 2017-03-22 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN105447040B (en) * 2014-08-29 2020-02-07 阿里巴巴集团控股有限公司 Binary file management and updating method, device and system
CN104536908B (en) * 2014-11-05 2017-12-29 中安威士(北京)科技有限公司 A kind of magnanimity small records efficient storage management method towards unit
CN104394222A (en) * 2014-11-26 2015-03-04 盐城师范学院 Cloud storage system and method
CN105005611B (en) * 2015-07-10 2018-11-30 中国海洋大学 A kind of file management system and file management method
CN105094992B (en) * 2015-09-25 2018-11-02 浪潮(北京)电子信息产业有限公司 A kind of method and system of processing file request
CN105677904B (en) * 2016-02-04 2019-07-12 杭州数梦工场科技有限公司 Small documents storage method and device based on distributed file system
CN107045422B (en) 2016-02-06 2020-12-01 华为技术有限公司 Distributed storage method and device
CN106446197B (en) * 2016-09-30 2019-11-19 华为数字技术(成都)有限公司 A kind of date storage method, apparatus and system
CN106570113B (en) * 2016-10-25 2022-04-01 中国电力科学研究院 Mass vector slice data cloud storage method and system
CN106776702B (en) * 2016-11-11 2021-03-05 北京奇虎科技有限公司 Method and device for processing indexes in master-slave database system
CN106775446B (en) * 2016-11-11 2020-04-17 中国人民解放军国防科学技术大学 Distributed file system small file access method based on solid state disk acceleration
CN106844417B (en) * 2016-11-21 2020-07-28 深信服科技股份有限公司 Hot migration method and device based on file directory
CN106843770A (en) * 2017-01-23 2017-06-13 北京思特奇信息技术股份有限公司 A kind of distributed file system small file data storage, read method and device
CN109144948B (en) * 2017-06-15 2021-10-08 海马云(天津)信息技术有限公司 Application file positioning method and device, electronic equipment and memory
CN108345693B (en) * 2018-03-16 2022-01-28 中国银行股份有限公司 File processing method and device
CN110597762A (en) * 2018-05-25 2019-12-20 杭州海康威视系统技术有限公司 File processing method, device, equipment and storage medium
CN109271361B (en) * 2018-08-13 2020-07-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN110908927A (en) * 2018-09-14 2020-03-24 慧荣科技股份有限公司 Data storage device and method for deleting name space thereof
CN109408487B (en) * 2018-11-01 2021-10-22 郑州云海信息技术有限公司 File processing system and method under NAS file system
CN109100951A (en) * 2018-11-01 2018-12-28 广东粤迪厚创科技发展有限公司 A kind of smart home system based on big data
CN111258956B (en) * 2019-03-22 2023-11-24 深圳市远行科技股份有限公司 Method and device for prereading far-end mass data files
CN110275865B (en) * 2019-06-20 2021-08-27 珠海天燕科技有限公司 File storage optimization method and device
CN112181937B (en) * 2019-07-04 2023-11-03 北京京东振世信息技术有限公司 Method and device for transferring data
CN111026707B (en) * 2019-11-05 2023-01-17 中国科学院计算机网络信息中心 Access method and device for small file object
CN111125216B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN111352586B (en) * 2020-02-23 2023-01-06 苏州浪潮智能科技有限公司 Directory aggregation method, device, equipment and medium for accelerating file reading and writing
CN111459882B (en) * 2020-03-30 2023-08-29 北京百度网讯科技有限公司 Namespace transaction processing method and device for distributed file system
CN112612857A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN114020216B (en) * 2021-11-03 2024-03-08 南京中孚信息技术有限公司 Method for improving small-capacity file tray-drop speed
CN114048185B (en) * 2021-11-18 2022-09-02 北京聚存科技有限公司 Method for transparently packaging, storing and accessing massive small files in distributed file system
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
CN114356230B (en) * 2021-12-22 2024-04-23 天津南大通用数据技术股份有限公司 Method and system for improving read performance of column storage engine
CN113986838B (en) * 2021-12-28 2022-03-11 成都云祺科技有限公司 Mass small file processing method and system based on file system and storage medium
CN117519612B (en) * 2024-01-06 2024-04-12 深圳市杉岩数据技术有限公司 Mass small file storage system and method based on index online splicing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556590A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for accessing small high-volume file by classification
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102222092A (en) * 2011-06-03 2011-10-19 复旦大学 Massive high-dimension data clustering method for MapReduce platform
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556590A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for accessing small high-volume file by classification
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102222092A (en) * 2011-06-03 2011-10-19 复旦大学 Massive high-dimension data clustering method for MapReduce platform
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Hadoop平台的海量文本分类的并行化;向小军等;《计算机科学》;20111031;第38卷(第10期);第184-188页 *

Also Published As

Publication number Publication date
CN103020315A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
US9710535B2 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
Jiang et al. THE optimization of HDFS based on small files
Vora Hadoop-HBase for large-scale data
US10176225B2 (en) Data processing service
Vorapongkitipun et al. Improving performance of small-file accessing in Hadoop
CN104408111A (en) Method and device for deleting duplicate data
CN102541985A (en) Organization method of client directory cache in distributed file system
CN104850572A (en) HBase non-primary key index building and inquiring method and system
CN103282899A (en) File system data storage method and access method and device therefor
US20120290595A1 (en) Super-records
CN103501319A (en) Low-delay distributed storage system for small files
US10990571B1 (en) Online reordering of database table columns
KR20090063733A (en) Method recovering data server at the applying multiple reproduce dispersion file system and metadata storage and save method thereof
CN113377868A (en) Offline storage system based on distributed KV database
Changtong An improved HDFS for small file
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
CN109800208B (en) Network traceability system and its data processing method, computer storage medium
Zhang et al. Blockchain storage middleware based on external database
Eddoujaji et al. Data processing on distributed systems storage challenges
EL-SAYED et al. Impact of small files on hadoop performance: literature survey and open points
Xue et al. A novel approach in improving I/O performance of small meteorological files on HDFS
Bui et al. ROARS: a scalable repository for data intensive scientific computing
CN112965939A (en) File merging method, device and equipment
Ren et al. An algorithm of merging small files in HDFS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant