CN103020315A - Method for storing mass of small files on basis of master-slave distributed file system - Google Patents

Method for storing mass of small files on basis of master-slave distributed file system Download PDF

Info

Publication number
CN103020315A
CN103020315A CN2013100091824A CN201310009182A CN103020315A CN 103020315 A CN103020315 A CN 103020315A CN 2013100091824 A CN2013100091824 A CN 2013100091824A CN 201310009182 A CN201310009182 A CN 201310009182A CN 103020315 A CN103020315 A CN 103020315A
Authority
CN
China
Prior art keywords
small documents
index
data
file
catalogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100091824A
Other languages
Chinese (zh)
Other versions
CN103020315B (en
Inventor
王蕾
何连跃
徐叶
李姗姗
戴华东
吴庆波
丁滟
黄辰林
付松龄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201310009182.4A priority Critical patent/CN103020315B/en
Publication of CN103020315A publication Critical patent/CN103020315A/en
Application granted granted Critical
Publication of CN103020315B publication Critical patent/CN103020315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for storing a mass of small files on the basis of a master-slave distributed file system, and aims to solve the problem about storing the mass small files in the master-slave distributed file system. The technical scheme includes that the method includes that a mass small file storage system is deployed and initialized, and then small files are created and read by a SmallFileAPI (small file-application program interface) of a client side according to a command received from a keyboard. When the small files are created by a system, the SmallFileAPI creates data files of the small files according to small file paths obtained from the client side, writes in small file data and simultaneously creates small file indexes at data nodes; when the small files are read by the system, data node information corresponding to a parent directory is acquired according to the small file paths, an index request is transmitted to any of the data nodes, and finally the small file data are read from the data files according to index information. By the aid of the method, the problem about mass of storage metadata of the mass of small files is solved, writing efficiency of the mass small file storage system is improved, and reliability of the system is guaranteed.

Description

A kind of mass small documents storage means based on master-salve distributed file system
Technical field
The present invention relates to the storage means of mass small documents on the master-salve distributed file system of the large file storage towards magnanimity.
Background technology
Along with the development of novel computing technique, be that enterprise or individual's data all begin to increase rapidly.It is not only the memory capacity problem that mass data increases what bring, returns data management, memory property has been brought challenge, becomes the key problem that the cloud computing epoch need to solve.In order to guarantee high available, the highly reliable and economy of data, cloud computing adopts the mode of distributed storage to store data, adopts the mode of redundant storage to guarantee the reliability of the data.In order to satisfy the demand of a large number of users, the memory technology of cloud computing must have high-throughput and high transmission rates.Data storage problem for cloud computing, industry member and academia have proposed multiple solution, this is comprising Google file system GFS, Hadoop increase income file system HDFS, towards NoSQL storage system Dynamo, Cassandra, MongoDB etc. semi-structured and the structural data storage.
Early stage in cloud computing, the design of storage system is mainly towards efficient storage and the access of the large file of magnanimity, a little less than the support to small documents, but the development along with personal terminal and mobile Internet, small documents shared ratio in cloud storage system is more and more higher, and the efficient storage of mass small documents and access become needs the urgent problem that solves.Small documents refers to the file of size from several KB to tens KB.For example, Taobao needs to store the commodity picture of magnanimity, and these pictures all are small documents; The search engine such as Google, Baidu need to grasp into ten thousand more than one hundred million webpages from network, and these webpages all are small documents.The small documents of trillion quantity sizes has consisted of mass small documents, if mass small documents can not efficient storage, will cause using the requirement that does not realize or do not satisfy the client in the face of mass small documents.The present invention mainly solves the efficient storage problem of mass small documents.
The architecture of master-salve distributed file system as shown in Figure 1.This type of distributed file system is comprised of a centralized meta data server (being also referred to as the metadata node) and a plurality of distributed data server (being also referred to as Data Node).The metadata of meta data server managing file system comprises the bibliographic structure of file system, the contents such as the memory location of each file, size, various attributes.The data of data server memory file system, i.e. file itself.During the master-salve distributed file system of client-access, accesses meta-data server at first obtains the metadata information of file, and then according to these information, the data server of access storage respective file obtains file.It is simple, manageable that the advantage of this type of file system is that design realizes, can realize that height is fault-tolerant, highly reliable, the design of high-throughput by simple technology.Shortcoming is that if adopt single meta data server, then it can become the performance bottleneck of system access, and single point failure occurs easily.If adopt the meta data server cluster, then can cause metadata management complicated, and reduce the efficient of metadata access.
The Typical Representative of this type of distributed file system is Google file system GFS, Hadoop file system HDFS, Luster, PVFS etc.Wherein HDFS increases income, and it can operate on the general hardware platform.The cluster of operation HDFS is comprised of a metadata node and a plurality of Data Node.HDFS is the system of an Error Tolerance, is fit to be deployed on the cheap computing machine, and the data access of high-throughput is provided, and is fit to very much the application on the large-scale data set, and supports the data of streaming file reading system.
At present, the storage mass small documents of master-salve distributed file system mainly contains following several method:
Method one is the Hadoop archive file, is called for short HAR(Hadoop Archive).Store the problem that mass small documents can exhaust the internal memory of metadata node among the HDFS in order to solve, Hadoop has proposed the method for HAR archive file.Can efficiently file be put into the HDFS piece by HAR, when reducing the use of metadata node internal memory, still allow file is carried out transparent access.The method is packaged into a HAR file with a plurality of small documents.But the HAR file comes with some shortcomings: at first, HAR file in case create just can not be revised, and increase or delete small documents, must create new HAR file, next, created the HAR file after, small documents originally can not deleted automatically, need to manually do deletion.Therefore HAR file unusual poor efficiency when processing mass small documents.
Method two is the Xuhui Liu of the Chinese Academy of Sciences, the Implementing WebGIS on Hadoop:A Case Study of Improving Small File I/O Performance on HDFS method that Jizhong Han etc. propose: small documents is packed, form a large file, the head of large file is preserved the index information of all small documents in this document.This large file is as the file storage of master-salve distributed file system.Each small documents retrieval, client is the query metadata server at first, obtains the metadata information of large file, and client is mutual with data server again, reads large file, obtains from the head index information, reads the file content of back again.Since towards the distributed file system of the large file storage of magnanimity, the mode that adopts streaming to read more, the efficient that reads at random is low, postpones greatly, and when a plurality of clients read a plurality of small documents simultaneously, efficient was very low, very flexible.
Method three is methods of small documents storage optimization correlation technique research under the HDFS of the announcements such as the Jiang Liu of Beijing University of Post ﹠ Telecommunication: small documents is kept in the data block of Data Node, the information such as position of metadata record small documents in data block of small documents, and be stored on the Data Node.The metadata information of all small documents also is kept on the hard disk of meta data server.Each small documents retrieval, client inquires about first on the Data Node of recent visit whether this small documents is arranged, if just do not need the accesses meta-data node, and meta data server need to read the complete metadata information that hard disk obtains accessed small documents, return to client, client is mutual with back end again, obtains small documents.The problem that the method exists is that the metadata access efficient of small documents is low, postpones long.
The problems such as to sum up, present master-salve distributed file system all is based on the method for large file storage, as is used for storing mass small documents, and then the ubiquity system effectiveness is low, small documents search index efficient is low, system reliability is poor.How to store efficiently, reliably mass small documents on the master-slave mode Distribute file system of the large file storage towards magnanimity is the technical matters that those skilled in the art pay close attention to.
Summary of the invention
The technical problem to be solved in the present invention is: general master-salve distributed file system can be stored the file of super large data scale, have high fault tolerance and enhanced scalability, but be used for storing mass small documents and can produce some problems: the master-salve distributed file system of (1) centralized Metadata Service only has single metadata node, quantity of documents determines the scale of metadata, mass small documents can expend the internal memory of metadata node, and its metadata can exhaust the internal memory of metadata node and exceed the limit that computer hardware can reach.(2) recall precision of mass small documents is low, in case the file data amount reaches after the certain scale, the recall precision of file sharply descends, and causes system to carry out slowly.
Technical scheme of the present invention is:
The first step is disposed the mass small documents storage system.The mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node.The index position maintenance module is each catalogue distribute data node (with IP address and port numbers as sign), mapping relations to catalogue and Data Node sort, return the Data Node of managing this small documents index, the i.e. assigned Data Node of small documents catalogue to client.The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table.The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark.The value of these three updating mark is " Y " and " N " two kinds.Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade.The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, but the caching record that a store M catalogue is corresponding, the Data Node address (being index position mapping item corresponding to catalogue) of the management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M arranges voluntarily according to user's needs, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client.The small documents index module need to create small documents index data structure Index when starting, it is the data structure that sorts with B-tree (seeing that R.Bayer andE.M.McCreight1972 is at the article " administering and maintaining of large-scale Ordered indices " of periodical Acta Informatica) according to directory path, its node is the set of small documents index record under the catalogue, equally also be a B-tree, with small documents order by name.The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents.
Second step carries out initialization to the mass small documents storage system, may further comprise the steps:
2.1 initialization index position mapping table, method are the data that read the index position mapping table from file dirToDatanode.If the dirToDatanode file is empty, the index position mapping table will be initialized as empty table.Afterwards, in a single day the index position mapping table has its all data of modification again to be saved in the dirToDatanode file.
2.2 initialization waiting list waitArrangeQueue, method is to read queuing data from file waitArrangeDir, if the waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue.Afterwards, in a single day waitArrangeQueue has its all data of modification again to be saved in the waitArrangeDir file.
2.3 initialization index data structure Index, the B-that Index is initialized as a sky sets, and Index dynamically reads index data according to the demand that client proposes from index file and index journal file.
In the 3rd step, the SmallFileAPI of client operates small documents according to the instruction of accepting from keyboard, if create small documents, carries out for the 4th step, turns for the 8th step if read small documents.
In the 4th step, then SmallFileAPI obtains data file and index position mapping item corresponding to small documents catalogue under the indicated catalogue in small documents path (being called for short the small documents catalogue) from the path of client acquisition establishment small documents.Data file is the in esse file of the NameSpace of master-salve distributed file system, the data that are used for storing all small documents under the same catalogue.Data file is comprised of data file head and data recording subsequently.The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number (Version) of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored.The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record.Every record is comprised of record length, key length, key, four of values.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.Each small documents is as a record, during the storage small documents directly the afterbody in data file append.
4.1 whether comprise the relevant information of small documents catalogue in the SmallFileAPI query caching module of client.The relevant information of small documents catalogue is included as the input/output information of this catalogue manipulative indexing position mapping item and small documents data file.If can from cache module, obtain, turned for the 5th step.If in cache module, do not find, execution in step 4.2.
4.2 the SmallFileAPI of client is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create the small documents catalogue, simultaneously the index position maintenance module of metadata node is three data nodes of this catalogue distribution, and the mapping relations of catalogue and three the data nodes list item as mapping table is inserted in the index position mapping table.
The index position maintenance module distributes the concrete grammar of three data nodes for this catalogue:
4.2.1, the total data node information that the index position maintenance module is safeguarded from the metadata node (can register at the metadata node by master-salve distributed file system data node, so have the information of all Data Nodes in the cluster in the metadata node) in obtain at random three data nodes, if successfully obtain three data nodes, the updating mark of these three data nodes is initialized as Y, turn step 4.2.3, if do not find three data nodes, execution in step 4.2.2.
4.2.2, catalogue that can not the distribute data node is joined among the formation waitArrangeQueue, the content of waitArrangeQueue is saved in the waitArrangeDir file again.To the signal of client return failure, turned for 13 steps, end operation.
4.2.3 the index position maintenance module is saved in the index position mapping table in the dirToDatanode file again.
4.3, make variable X=1;
4.4, under the catalogue of small documents, do not create dataX by the metadata node if the data file dataX under the small documents catalogue does not exist then;
4.5, if the data file dataX under the small documents catalogue exists, the output information of the data file dataX of the SmallFileAPI of client under the metadata node acquisition request small documents catalogue, if successfully obtain dataX output information (dataX is taken by other client under the small documents catalogue at this moment), execution in step 4.6; If successfully do not obtain dataX output information, make X increase 1, if X<=P turns 4.4, if X〉P, return error message to client, turned for the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by the user, general P=32;
4.6 the SmallFileAPI of client proposes Data Node request corresponding to inquiry small documents catalogue to the index position maintenance module of metadata node, index position maintenance module search index position mapping table and this list item that this small documents catalogue is corresponding return to client.
4.7 the data file output information of the SmallFileAPI of client index position mapping item that this small documents catalogue is corresponding and this small documents is recorded to cache module, and is if cache module is full, then least recently used by LRU() algorithm eliminates.
In the 5th step, client is written to the data of small documents in the data file.Client is write small documents among the 4.5 resulting data file dataX as a data record, and returns the side-play amount of small documents in data file, and namely data are recorded in the position in the data file.
In the 6th step, the small documents index module of Data Node creates the small documents index.Client sends title, the side-play amount of small documents in data file and the request of Data Node updating mark and proposition establishment small documents index of the data file of small documents path, storage small documents to the master data node in three data nodes of the index position mapping items of 4.6 acquisitions.If this master data node breaks down, return fail result to client, turned for the 13 step, if the master data node is normal, then carry out following work by the small documents index module after the request of receiving of master data node:
6.1 the small documents index module judges whether that according to the updating mark of master data node needs upgrade operation to the small documents index under this catalogue, updating mark position Y carries out 6.2, and updating mark is that N goes to 6.3.
6.2 small documents index module read path in distributed file system is/index/ small documents directory path .index and/two files of index/ small documents directory path .log.Small documents directory path .index file is called index file, deposits all small documents index records under this catalogue, these index records with the B-data tree structure to the rear preservation of sorting of small documents pathname.Small documents directory path .log is called journal file, deposits the certain operations record to small documents index under this catalogue, comprises establishment, deletion index, and it is comprised of action type and index record.Action type refers to the action that operates, as creating and deletion.
The small documents index module reads index data according to small documents directory path .index file and small documents directory path .log file, and step is as follows:
6.2.1 the small documents index module reads the data of small documents directory path .index, generates B-according to these data and sets as Knots inserting in the index data structure Index of internal memory.
6.2.2 the small documents index module reads the operation note of index among the small documents directory path .log successively, re-starts operation according to these operation notes.For creating, the index information that then extracts this operation note is inserted among the index data structure Index according to B-tree insertion algorithm such as action type.
6.2.3 small documents index module rename small documents directory path .index is small documents directory path .index.tmp.Newly-built index file called after small documents directory path .index, and all index records corresponding to this small documents catalogue among the Index are saved in the newly-built index file, the deletion suffix is the index file of .tmp.
6.3 empty small documents directory path .log file content, and obtain the write operation information of this small documents directory path .log by the small documents index module, prepare to write journalizing.
6.4 small documents pathname, Data Filename, offset information generating indexes record that the small documents index module will be obtained from client, search index data structure Index, index record is inserted into correspondence position in the Index tree by small documents pathname ordering, the operation that creates index is write by among the 6.3 small documents directory path .log that obtain.
6.5 the small documents index module sends small documents to client and creates successful signal.
In the 7th step, the cache module of client is revised the updating mark of master data node in caching record corresponding to small documents catalogue.The updating mark of master data node is revised as N, and the updating mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y.Turned for the 13 step.
The 4th step is to the process of the 7th step for the establishment small documents of mass small documents storage system.
The 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, according to small documents directory search client-cache module, then carried out for the 9th step if can not find Data Node information corresponding to small documents catalogue, if found Data Node information corresponding to little civilian catalogue, then turned for the tenth step.
The 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, the index position maintenance module search index position mapping table index position mapping item that this small documents catalogue is corresponding returns to client, client with the information recording/ that obtains in cache module.
The tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes, if updating mark is Y, then the small documents index module is upgraded the index of all small documents under the small documents catalogue, and its concrete steps are as follows:
10.1 the small documents index module reads the data of small documents directory path .index, and generation B-sets as Knots inserting in index data structure Index.
10.2 the small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes.
10.3 Data Node returns to client by small documents index module inquiry Index with the index record of small documents.
The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the inquiring client terminal cache module, obtain the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, and be recorded to cache module.
In the 12 step, the SmallFileAPI of client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents.
The 8th step to the 12 step is the process that reads small documents of mass small documents storage system.
In the 13 step, the SmallFileAPI of client determines whether and still has the instruction input, if having, turned for the 3rd step; If nothing finishes.
The present invention is a kind of mass small documents storage means based on master-salve distributed file system, adopts the present invention can reach following technique effect:
(1) it was stored in the small documents data in the master-salve distributed file system by the 5th step, realized the distributed storage of data and fault-tolerant, had reached Mass storage and the reliability of data.
(2) by the 6th step the index of small documents is distributed to each Data Node and manages the problem of cell data node when the storage mass small documents that solved, step 6.2 is by distributed file system storage small documents index simultaneously, it is fault-tolerant to utilize the fault tolerant mechanism of distributed file system itself that the small documents index is carried out, and has reduced the danger that the small documents index is lost.
(3) on above basis, the mass small documents storage system at client-cache module buffer memory the small documents index position information commonly used of user and the information of data file, avoid mutual continually with the metadata node, greatly improved the performance of system.
Experiment shows that the present invention can solve the huge problem of mass small documents storing metadata well, and the mass small documents storage system is write efficient and is greatly improved the fault-tolerant reliability that guarantees system of small documents index.
Description of drawings
The structural drawing of the master-salve distributed file system of Fig. 1 background technology;
The mass small documents storage system overall construction drawing that Fig. 2 first step of the present invention is disposed;
Fig. 3 overview flow chart of the present invention;
Fig. 4 index position mapping table of the present invention structural drawing.
The data file structure figure that Fig. 5 the present invention created in the 4th step;
The small documents index record structural drawing that Fig. 6 the present invention 6.4 step small documents index module generates;
Embodiment
Accompanying drawings the specific embodiment of the present invention.
Fig. 1 is the structural drawing of master-salve distributed file system.
Fig. 2 is the overall construction drawing of the mass small documents storage system of first step structure of the present invention.The mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system.These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node.The index position maintenance module is each catalogue distribute data node (with IP address and port numbers as sign), mapping relations to catalogue and Data Node sort, return the Data Node of managing this small documents index, the i.e. assigned Data Node of small documents catalogue to client.The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table.The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark as shown in Figure 4.The value of these three updating mark is " Y " and " N " two kinds.Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade.The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue.The capacity of the cache module of client is initialized as M, but the caching record that a store M catalogue is corresponding, the Data Node address (being index position mapping item corresponding to catalogue) of the management small documents index that each caching record storage user often uses recently and the data file input/output information of small documents, M arranges voluntarily according to user's needs, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client.The small documents index module need to create small documents index data structure Index when starting, it is the data structure that sorts with the B-tree according to directory path, its node is the set of small documents index record under the catalogue, equally also is a B-tree, with small documents order by name.The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents.
Fig. 3 is overview flow chart of the present invention.
The first step is disposed the mass small documents storage system.
Second step, the initialization of mass small documents storage system.
The 3rd step, select the operation to small documents, turned for the 4th step if create small documents, turned for the 8th step if read small documents.
In the 4th step, the SmallFileAPI of client comes the data file of newly-built storage small documents according to the path that creates small documents.
In the 5th step, the SmallFileAPI of client is written to the data of small documents in the data file.
In the 6th step, the small documents index module creates the index of small documents.
In the 7th step, client is revised the updating mark of the master data node of caching record corresponding to the medium and small file directory of cache module.Turned for the 13 step.
In the 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, searches Data Node information corresponding to catalogue in cache module, if search less than Data Node information, then carried out for the 9th step, if found Data Node information, then turned for the tenth step.
In the 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, and three data nodes and updating mark are returned to client, client with the information recording/ that obtains in cache module.
In the tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes.
The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, in cache module, search the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, the data file input message of small documents is recorded to cache module.
In the 12 step, client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents.
In the 13 step, the SmallFileAPI of client determines whether that keyboard still has the instruction input, if having, turns for the 3rd step; If nothing finishes.
Fig. 4 is the catalogue of small documents and the structure of back end mapping table.Each catalogue is to having 3 data nodes, the particular location of representative data node.Each Data Node updating mark thereafter represents whether the small documents index in current this Data Node internal memory needs to upgrade, if be Y, then expression needs to upgrade, if for N then identify and do not need renewal.
Fig. 5 is the data file structure of small documents storage.Data file is comprised of data file head and data recording subsequently.The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number (Version) of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored.The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record.Every record is comprised of record length, key length, key, four of values.Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents.
Fig. 6 is the index record structural drawing that 6.4 step small documents index module generate.Path, Data Filename and the small documents side-play amount in data file that comprises small documents in each index record.

Claims (3)

1. mass small documents storage means based on master-salve distributed file system is characterized in that may further comprise the steps:
The first step is disposed the mass small documents storage system, and the mass small documents storage system is comprised of the software for the treatment of mass small documents on master-salve distributed file system and each node of master-salve distributed file system; These softwares comprise index position maintenance module, the small documents index module on the Data Node, client-cache module and the client operation small documents special purpose interface SmallFileAPI on the metadata node; The index position maintenance module is each catalogue distribute data node, and the mapping relations of catalogue and Data Node are sorted, and returns the Data Node of this small documents index of management, the i.e. assigned Data Node of small documents catalogue to client; The index position maintenance module adopts save contents mapping relations with Data Node of index position mapping table; The index position mapping table is comprised of catalogue, master data node, master data node updating mark, first authentic copy Data Node, first authentic copy Data Node updating mark, triplicate Data Node, seven of triplicate Data Node updating mark; The value of these three updating mark is " Y " and " N " two kinds; Small documents index on " Y " expression Data Node under this catalogue is up-to-date, does not need to upgrade, and " N " expression is not up-to-date, needs to upgrade; The index position maintenance module creates formation waitArrangeQueue, its entries in queues is directory path, be used for recording not can successful distribute data node catalogue, fashionable to wait for that distributed file system has new Data Node to add, the index position maintenance module is redistributed the catalogue among the waitArrangeQueue; The index position maintenance module also creates two empty files at disk, be respectively file dirToDatanode and file waitArrangeDir, file dirToDatanode is used for storing the content of index position mapping table, and waitArrangeDir is used for the content of storage queue waitArrangeQueue; The capacity of the cache module of client is initialized as M, the index position mapping item that each caching record storage directory is corresponding and the data file input/output information of small documents, and M is positive integer; The small documents index module receives client to the request of index creation or inquiry, judges whether that according to updating mark in the index position mapping table needs load the small documents index data, with the small documents index or create whether successful result returns to client; Create small documents index data structure Index when the small documents index module starts, it is the data structure that sorts with the B-tree according to directory path, and its node is the set of small documents index record under the catalogue, equally also is a B-tree, with small documents order by name; The small documents index record represents the index of a small documents, comprises the pathname of small documents, data file path and the side-play amount of small documents in data file of small documents; SmallFileAPI finishes the mutual software of client and mass small documents memory system data, comprises the operation that creates and read small documents;
Second step carries out initialization to the mass small documents storage system, may further comprise the steps:
2.1 initialization index position mapping table, method are the data that read the index position mapping table from file dirToDatanode, if the dirToDatanode file is empty, the index position mapping table will be initialized as empty table;
2.2 initialization waiting list waitArrangeQueue, method is to read queuing data from file waitArrangeDir, if the waitArrangeDir file is empty, waitArrangeQueue will be initialized as empty queue;
2.3 initialization index data structure Index, the B-that Index is initialized as a sky sets;
In the 3rd step, the SmallFileAPI of client operates small documents according to the instruction of accepting from keyboard, if create small documents, carries out for the 4th step, turns for the 8th step if read small documents;
In the 4th step, SmallFileAPI to create the path of small documents from client, then obtains the indicated catalogue in small documents path and be data file and index position mapping item corresponding to small documents catalogue under the small documents catalogue, and method is:
4.1 whether comprise the relevant information of small documents catalogue in the SmallFileAPI query caching module of client, the relevant information of small documents catalogue is included as the input/output information of this catalogue manipulative indexing position mapping item and small documents data file, if can from cache module, obtain, turned for the 5th step; If in cache module, do not find, execution in step 4.2;
4.2 the SmallFileAPI of client is according to the path of small documents, extract the path of small documents catalogue, if this small documents catalogue does not exist, then create the small documents catalogue, simultaneously the index position maintenance module of metadata node is three data nodes of this catalogue distribution, and the mapping relations of catalogue and three the data nodes list item as mapping table is inserted in the index position mapping table; The index position maintenance module distributes the concrete grammar of three data nodes for this catalogue:
4.2.1, the index position maintenance module obtains three data nodes at random from the total data node information that the metadata node is safeguarded, if successfully obtain three data nodes, the updating mark of these three data nodes is initialized as Y, turn step 4.2.3, if do not find three data nodes, execution in step 4.2.2;
4.2.2, catalogue that can not the distribute data node is joined among the formation waitArrangeQueue, the content of waitArrangeQueue is saved in the waitArrangeDir file, to the signal of client return failure, turned for 13 steps, end operation;
4.2.3 the index position maintenance module is saved in the index position mapping table in the dirToDatanode file;
4.3, make variable X=1;
4.4, under the catalogue of small documents, do not create dataX by the metadata node if the data file dataX under the small documents catalogue does not exist then;
4.5 if the data file dataX under the small documents catalogue exists, the output information of the data file dataX of the SmallFileAPI of client under the metadata node acquisition request small documents catalogue is if successfully obtain dataX output information, execution in step 4.6; If successfully do not obtain dataX output information, make X increase 1, if X<=P turns 4.4, if X〉P, return error message to client, turned for the 13 step; P is the data file number that can create under this catalogue, and P is positive integer, and the value of P is arranged voluntarily by the user, general P=32;
4.6 the SmallFileAPI of client proposes Data Node request corresponding to inquiry small documents catalogue to the index position maintenance module of metadata node, index position maintenance module search index position mapping table and this list item that this small documents catalogue is corresponding return to client;
4.7 the index position mapping item that the SmallFileAPI of client is corresponding with this small documents catalogue and the data file output information of this small documents are recorded to cache module, if cache module is full, be that least recently used algorithm is eliminated by lru algorithm then;
In the 5th step, client is write small documents among the 4.5 resulting data file dataX as a data record, and returns the side-play amount of small documents in data file, and namely data are recorded in the position in the data file;
In the 6th step, the small documents index module of Data Node creates the small documents index; Client sends title, the side-play amount of small documents in data file and the request of Data Node updating mark and proposition establishment small documents index of the data file of small documents path, storage small documents to the master data node in three data nodes of the index position mapping items of 4.6 acquisitions, if this master data node breaks down, return fail result to client, turned for the 13 step, if the master data node is normal, then carry out following work by the small documents index module after the request of receiving of master data node:
6.1 the small documents index module judges whether that according to the updating mark of master data node needs upgrade operation to the small documents index under this catalogue, updating mark position Y carries out 6.2, and updating mark is that N goes to 6.3.
6.2 small documents index module read path in distributed file system is/index/ small documents directory path .index and/two files of index/ small documents directory path .log, small documents directory path .index file is called index file, deposit all small documents index records under this catalogue, these index records with the B-data tree structure to the rear preservation of sorting of small documents pathname; Small documents directory path .log is called journal file, deposits the operation note to small documents index under this catalogue, is comprised of action type and index record; Read index data according to small documents directory path .index file and small documents directory path .log, step is as follows:
6.2.1 the small documents index module reads the data of small documents directory path .index, generates B-according to these data and sets as Knots inserting in the index data structure Index of internal memory;
6.2.2 the small documents index module reads the operation note of index among the small documents directory path .log successively, re-starts operation according to these operation notes;
6.2.3, small documents index module rename small documents directory path .index is small documents directory path .index.tmp, newly-built index file called after small documents directory path .index, and all index records corresponding to this small documents catalogue among the Index are saved in the newly-built index file, the deletion suffix is the index file of .tmp;
6.3 empty small documents directory path .log file content, and obtain the write operation information of this small documents directory path .log by the small documents index module, prepare to write journalizing;
6.4 small documents pathname, Data Filename, offset information generating indexes record that the small documents index module will be obtained from client, search index data structure Index, index record is inserted into correspondence position in the Index tree by small documents pathname ordering, the operation that creates index is write by among the 6.3 small documents directory path .log that obtain;
6.5 the small documents index module sends small documents to client and creates successful signal;
The 7th step, the cache module of client is revised the updating mark of master data node in caching record corresponding to small documents catalogue, the updating mark of master data node is revised as N, and the updating mark of all the other first authentic copy Data Nodes and triplicate Data Node is revised as Y, turns for the 13 step;
The 8th step, the SmallFileAPI of client obtains the small documents catalogue according to the small documents path, according to small documents directory search client-cache module, then carried out for the 9th step if can not find Data Node information corresponding to small documents catalogue, if found Data Node information corresponding to little civilian catalogue, then turned for the tenth step;
The 9th step, the SmallFileAPI of client proposes the request of query directory index position to the index position maintenance module of metadata node, the index position maintenance module search index position mapping table index position mapping item that this small documents catalogue is corresponding returns to client, client with the information recording/ that obtains in cache module;
The tenth step, the SmallFileAPI of client selects the request of any one the transmission inquiry small documents index in three data nodes, if updating mark is Y, then the small documents index module is upgraded the index of all small documents under the small documents catalogue, and concrete steps are as follows:
10.1 the small documents index module reads the data of small documents directory path .index, and generation B-sets as Knots inserting in index data structure Index;
10.2 the small documents index module reads the index operation record of small documents directory path .log successively, re-starts operation according to these operation notes;
10.3 Data Node returns to client by small documents index module inquiry Index with the index record of small documents;
The 11 step, the SmallFileAPI of client is according to Data Filename in the index record of small documents, the inquiring client terminal cache module, obtain the input message of data file, if not then utilize Distribute file system to read the data file input message that file interface obtains small documents, and be recorded to cache module;
In the 12 step, the SmallFileAPI of client reads the data of small documents from data file according to the side-play amount of small documents in data file in the index record of small documents;
In the 13 step, the SmallFileAPI of client determines whether and still has the instruction input, if having, turned for the 3rd step; If nothing finishes.
2. the mass small documents storage means based on master-salve distributed file system as claimed in claim 1, it is characterized in that described data file is the in esse file of NameSpace of master-salve distributed file system, the data that are used for storing all small documents under the same catalogue; Data file is comprised of data file head and data recording subsequently; The data file head is comprised of four fields, and the first field accounts for three bytes, and the description document type is in order to distinguish data file and other common files; Second field accounts for a byte, the version number of expression data file; The 3rd field represents key type, illustrates key is with which kind of data type to store; The type of the 4th field list indicating value, explanation value are which kind of data type is stored; The data file head is afterwards immediately following one or more record, and the data that small documents is complete stored in each bar record, and every record is comprised of record length, key length, key, four of values; Wherein the content of key is the filename of small documents, and the content of value is the content of small documents, and key length is the length of the filename of small documents; Each small documents is as a record, during the storage small documents directly the afterbody in data file append.
3. the mass small documents storage means based on master-salve distributed file system as claimed in claim 1 is characterized in that described P32.
CN201310009182.4A 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system Active CN103020315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310009182.4A CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310009182.4A CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Publications (2)

Publication Number Publication Date
CN103020315A true CN103020315A (en) 2013-04-03
CN103020315B CN103020315B (en) 2015-08-19

Family

ID=47968918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310009182.4A Active CN103020315B (en) 2013-01-10 2013-01-10 A kind of mass small documents storage means based on master-salve distributed file system

Country Status (1)

Country Link
CN (1) CN103020315B (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353901A (en) * 2013-08-01 2013-10-16 百度在线网络技术(北京)有限公司 Orderly table data management method and system based on Hadoop distributed file system (HDFS)
CN103366016A (en) * 2013-08-01 2013-10-23 南京大学 Electronic file concentrated storing and optimizing method based on HDFS
CN103731369A (en) * 2013-12-27 2014-04-16 乐视网信息技术(北京)股份有限公司 Method and system for updating data queue in server
CN104123359A (en) * 2014-07-17 2014-10-29 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN104394222A (en) * 2014-11-26 2015-03-04 盐城师范学院 Cloud storage system and method
CN104536908A (en) * 2014-11-05 2015-04-22 北京中安比特科技有限公司 Single-machine-oriented mass small record efficient storage and management method
CN105005611A (en) * 2015-07-10 2015-10-28 中国海洋大学 File management system and file management method
CN105094992A (en) * 2015-09-25 2015-11-25 浪潮(北京)电子信息产业有限公司 File request processing method and system
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
CN105447040A (en) * 2014-08-29 2016-03-30 阿里巴巴集团控股有限公司 Binary file management and update method and device, and binary file management system
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN106570113A (en) * 2016-10-25 2017-04-19 中国电力科学研究院 Cloud storage method and system for mass vector slice data
CN106775446A (en) * 2016-11-11 2017-05-31 中国人民解放军国防科学技术大学 Based on the distributed file system small documents access method that solid state hard disc accelerates
CN106776702A (en) * 2016-11-11 2017-05-31 北京奇虎科技有限公司 A kind of method and apparatus for processing the index in master-slave mode Database Systems
CN106843770A (en) * 2017-01-23 2017-06-13 北京思特奇信息技术股份有限公司 A kind of distributed file system small file data storage, read method and device
CN106844417A (en) * 2016-11-21 2017-06-13 深圳市深信服电子科技有限公司 Thermomigration process and device based on file directory
WO2018058949A1 (en) * 2016-09-30 2018-04-05 华为技术有限公司 Data storage method, device and system
CN108345693A (en) * 2018-03-16 2018-07-31 中国银行股份有限公司 A kind of document handling method and device
CN109100951A (en) * 2018-11-01 2018-12-28 广东粤迪厚创科技发展有限公司 A kind of smart home system based on big data
CN109144948A (en) * 2017-06-15 2019-01-04 海马云(天津)信息技术有限公司 Method, apparatus, electronic equipment and the memory of application file positioning
CN109271361A (en) * 2018-08-13 2019-01-25 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN109408487A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 Document handling system and method under a kind of NAS file system
CN110275865A (en) * 2019-06-20 2019-09-24 珠海天燕科技有限公司 File storage optimization method and device
WO2019223377A1 (en) * 2018-05-25 2019-11-28 杭州海康威视系统技术有限公司 File processing method, apparatus and device, and storage medium
CN110908927A (en) * 2018-09-14 2020-03-24 慧荣科技股份有限公司 Data storage device and method for deleting name space thereof
CN111026707A (en) * 2019-11-05 2020-04-17 中国科学院计算机网络信息中心 Access method and device for small file object
CN111125216A (en) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN111258956A (en) * 2019-03-22 2020-06-09 深圳市远行科技股份有限公司 Method and equipment for pre-reading mass data files facing far end
CN111352586A (en) * 2020-02-23 2020-06-30 苏州浪潮智能科技有限公司 Directory aggregation method, device, equipment and medium for accelerating file reading and writing
CN111459882A (en) * 2020-03-30 2020-07-28 北京百度网讯科技有限公司 Namespace transaction processing method and device of distributed file system
CN112181937A (en) * 2019-07-04 2021-01-05 北京京东振世信息技术有限公司 Data transmission method and device
CN112612857A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN113986838A (en) * 2021-12-28 2022-01-28 成都云祺科技有限公司 Mass small file processing method and system based on file system and storage medium
CN114020216A (en) * 2021-11-03 2022-02-08 南京中孚信息技术有限公司 Method for improving tray falling speed of small-capacity file
CN114048185A (en) * 2021-11-18 2022-02-15 北京聚存科技有限公司 Method for transparently packaging, storing and accessing massive small files in distributed file system
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
US11301154B2 (en) 2016-02-06 2022-04-12 Huawei Technologies Co., Ltd. Distributed storage method and device
CN114356230A (en) * 2021-12-22 2022-04-15 天津南大通用数据技术股份有限公司 Method and system for improving reading performance of column storage engine
CN117519612A (en) * 2024-01-06 2024-02-06 深圳市杉岩数据技术有限公司 Mass small file storage system and method based on index online splicing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556590A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for accessing small high-volume file by classification
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102222092A (en) * 2011-06-03 2011-10-19 复旦大学 Massive high-dimension data clustering method for MapReduce platform
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556590A (en) * 2008-04-09 2009-10-14 北京闻言科技有限公司 Method for accessing small high-volume file by classification
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102222092A (en) * 2011-06-03 2011-10-19 复旦大学 Massive high-dimension data clustering method for MapReduce platform
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
向小军等: "基于Hadoop平台的海量文本分类的并行化", 《计算机科学》 *

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366016A (en) * 2013-08-01 2013-10-23 南京大学 Electronic file concentrated storing and optimizing method based on HDFS
CN103353901A (en) * 2013-08-01 2013-10-16 百度在线网络技术(北京)有限公司 Orderly table data management method and system based on Hadoop distributed file system (HDFS)
CN103731369A (en) * 2013-12-27 2014-04-16 乐视网信息技术(北京)股份有限公司 Method and system for updating data queue in server
CN105279166A (en) * 2014-06-20 2016-01-27 中国电信股份有限公司 File management method and system
CN104123359B (en) * 2014-07-17 2017-03-22 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN104123359A (en) * 2014-07-17 2014-10-29 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system
CN105447040B (en) * 2014-08-29 2020-02-07 阿里巴巴集团控股有限公司 Binary file management and updating method, device and system
CN105447040A (en) * 2014-08-29 2016-03-30 阿里巴巴集团控股有限公司 Binary file management and update method and device, and binary file management system
CN104536908A (en) * 2014-11-05 2015-04-22 北京中安比特科技有限公司 Single-machine-oriented mass small record efficient storage and management method
CN104536908B (en) * 2014-11-05 2017-12-29 中安威士(北京)科技有限公司 A kind of magnanimity small records efficient storage management method towards unit
CN104394222A (en) * 2014-11-26 2015-03-04 盐城师范学院 Cloud storage system and method
CN105005611B (en) * 2015-07-10 2018-11-30 中国海洋大学 A kind of file management system and file management method
CN105005611A (en) * 2015-07-10 2015-10-28 中国海洋大学 File management system and file management method
CN105094992B (en) * 2015-09-25 2018-11-02 浪潮(北京)电子信息产业有限公司 A kind of method and system of processing file request
CN105094992A (en) * 2015-09-25 2015-11-25 浪潮(北京)电子信息产业有限公司 File request processing method and system
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN105677904B (en) * 2016-02-04 2019-07-12 杭州数梦工场科技有限公司 Small documents storage method and device based on distributed file system
US11301154B2 (en) 2016-02-06 2022-04-12 Huawei Technologies Co., Ltd. Distributed storage method and device
WO2018058949A1 (en) * 2016-09-30 2018-04-05 华为技术有限公司 Data storage method, device and system
CN106570113B (en) * 2016-10-25 2022-04-01 中国电力科学研究院 Mass vector slice data cloud storage method and system
CN106570113A (en) * 2016-10-25 2017-04-19 中国电力科学研究院 Cloud storage method and system for mass vector slice data
CN106776702B (en) * 2016-11-11 2021-03-05 北京奇虎科技有限公司 Method and device for processing indexes in master-slave database system
CN106776702A (en) * 2016-11-11 2017-05-31 北京奇虎科技有限公司 A kind of method and apparatus for processing the index in master-slave mode Database Systems
CN106775446A (en) * 2016-11-11 2017-05-31 中国人民解放军国防科学技术大学 Based on the distributed file system small documents access method that solid state hard disc accelerates
CN106844417A (en) * 2016-11-21 2017-06-13 深圳市深信服电子科技有限公司 Thermomigration process and device based on file directory
CN106844417B (en) * 2016-11-21 2020-07-28 深信服科技股份有限公司 Hot migration method and device based on file directory
CN106843770A (en) * 2017-01-23 2017-06-13 北京思特奇信息技术股份有限公司 A kind of distributed file system small file data storage, read method and device
CN109144948A (en) * 2017-06-15 2019-01-04 海马云(天津)信息技术有限公司 Method, apparatus, electronic equipment and the memory of application file positioning
CN109144948B (en) * 2017-06-15 2021-10-08 海马云(天津)信息技术有限公司 Application file positioning method and device, electronic equipment and memory
CN108345693B (en) * 2018-03-16 2022-01-28 中国银行股份有限公司 File processing method and device
CN108345693A (en) * 2018-03-16 2018-07-31 中国银行股份有限公司 A kind of document handling method and device
WO2019223377A1 (en) * 2018-05-25 2019-11-28 杭州海康威视系统技术有限公司 File processing method, apparatus and device, and storage medium
CN109271361A (en) * 2018-08-13 2019-01-25 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN109271361B (en) * 2018-08-13 2020-07-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Distributed storage method and system for massive small files
CN110908927A (en) * 2018-09-14 2020-03-24 慧荣科技股份有限公司 Data storage device and method for deleting name space thereof
CN109100951A (en) * 2018-11-01 2018-12-28 广东粤迪厚创科技发展有限公司 A kind of smart home system based on big data
CN109408487A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 Document handling system and method under a kind of NAS file system
CN111258956A (en) * 2019-03-22 2020-06-09 深圳市远行科技股份有限公司 Method and equipment for pre-reading mass data files facing far end
CN111258956B (en) * 2019-03-22 2023-11-24 深圳市远行科技股份有限公司 Method and device for prereading far-end mass data files
CN110275865B (en) * 2019-06-20 2021-08-27 珠海天燕科技有限公司 File storage optimization method and device
CN110275865A (en) * 2019-06-20 2019-09-24 珠海天燕科技有限公司 File storage optimization method and device
CN112181937B (en) * 2019-07-04 2023-11-03 北京京东振世信息技术有限公司 Method and device for transferring data
CN112181937A (en) * 2019-07-04 2021-01-05 北京京东振世信息技术有限公司 Data transmission method and device
CN111026707A (en) * 2019-11-05 2020-04-17 中国科学院计算机网络信息中心 Access method and device for small file object
CN111026707B (en) * 2019-11-05 2023-01-17 中国科学院计算机网络信息中心 Access method and device for small file object
CN111125216B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN111125216A (en) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN111352586A (en) * 2020-02-23 2020-06-30 苏州浪潮智能科技有限公司 Directory aggregation method, device, equipment and medium for accelerating file reading and writing
CN111459882A (en) * 2020-03-30 2020-07-28 北京百度网讯科技有限公司 Namespace transaction processing method and device of distributed file system
CN111459882B (en) * 2020-03-30 2023-08-29 北京百度网讯科技有限公司 Namespace transaction processing method and device for distributed file system
CN112612857A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN114020216A (en) * 2021-11-03 2022-02-08 南京中孚信息技术有限公司 Method for improving tray falling speed of small-capacity file
CN114020216B (en) * 2021-11-03 2024-03-08 南京中孚信息技术有限公司 Method for improving small-capacity file tray-drop speed
CN114048185A (en) * 2021-11-18 2022-02-15 北京聚存科技有限公司 Method for transparently packaging, storing and accessing massive small files in distributed file system
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
CN114356230A (en) * 2021-12-22 2022-04-15 天津南大通用数据技术股份有限公司 Method and system for improving reading performance of column storage engine
CN114356230B (en) * 2021-12-22 2024-04-23 天津南大通用数据技术股份有限公司 Method and system for improving read performance of column storage engine
CN113986838A (en) * 2021-12-28 2022-01-28 成都云祺科技有限公司 Mass small file processing method and system based on file system and storage medium
CN117519612A (en) * 2024-01-06 2024-02-06 深圳市杉岩数据技术有限公司 Mass small file storage system and method based on index online splicing
CN117519612B (en) * 2024-01-06 2024-04-12 深圳市杉岩数据技术有限公司 Mass small file storage system and method based on index online splicing

Also Published As

Publication number Publication date
CN103020315B (en) 2015-08-19

Similar Documents

Publication Publication Date Title
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
US9710535B2 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
Jiang et al. THE optimization of HDFS based on small files
EP2780834B1 (en) Processing changes to distributed replicated databases
Vora Hadoop-HBase for large-scale data
CN102541985A (en) Organization method of client directory cache in distributed file system
CN111427847B (en) Indexing and querying method and system for user-defined metadata
CN104408111A (en) Method and device for deleting duplicate data
CN103282899A (en) File system data storage method and access method and device therefor
US20120290595A1 (en) Super-records
CN106570113B (en) Mass vector slice data cloud storage method and system
CN106446099A (en) Distributed cloud storage method and system and uploading and downloading method thereof
CN103501319A (en) Low-delay distributed storage system for small files
CN105138275B (en) A kind of Lustre memory system datas sharing method
CN113377868A (en) Offline storage system based on distributed KV database
Changtong An improved HDFS for small file
CN103942301B (en) Distributed file system oriented to access and application of multiple data types
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
CN104899161A (en) Cache method based on continuous data protection of cloud storage environment
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
WO2017023709A1 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
CN109800208B (en) Network traceability system and its data processing method, computer storage medium
Zhang et al. Blockchain storage middleware based on external database
Eddoujaji et al. Data processing on distributed systems storage challenges
Xue et al. A novel approach in improving I/O performance of small meteorological files on HDFS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant