CN113076298A - Distributed small file storage system - Google Patents

Distributed small file storage system Download PDF

Info

Publication number
CN113076298A
CN113076298A CN202110404012.0A CN202110404012A CN113076298A CN 113076298 A CN113076298 A CN 113076298A CN 202110404012 A CN202110404012 A CN 202110404012A CN 113076298 A CN113076298 A CN 113076298A
Authority
CN
China
Prior art keywords
file
node
masternode
datanode
directory tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110404012.0A
Other languages
Chinese (zh)
Inventor
许士松
朱坤奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuo Steel Chain Technology Co ltd
Original Assignee
Shanghai Zhuo Steel Chain Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuo Steel Chain Technology Co ltd filed Critical Shanghai Zhuo Steel Chain Technology Co ltd
Priority to CN202110404012.0A priority Critical patent/CN113076298A/en
Publication of CN113076298A publication Critical patent/CN113076298A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services

Abstract

The invention discloses a distributed small file storage system, which comprises a Master node and a plurality of DataNode nodes, wherein the Master node and the DataNode nodes are deployed by adopting a Master-Slave architecture; the MasterNode node is used for operating and managing a file directory tree and managing a DataNode node; the DataNode node is used for storing file data recorded in the file directory tree; the file directory tree is stored in a redis database cluster, and when the MasterNode node operates the file directory tree, the file directory tree is obtained from the redis database cluster. The invention solves the problem that a large amount of small files cannot be efficiently stored.

Description

Distributed small file storage system
Technical Field
The invention belongs to the technical field of data storage, and particularly relates to a distributed small file storage system.
Background
The invention is mainly based on two backgrounds, firstly, the enterprise digital transformation is accelerated, the requirement of mass data storage exists, secondly, the rapid development of a mass distributed file system, especially the development of the prior big data technology, the distributed file storage is widely applied in the enterprise, and the technical development is relatively mature.
At present, distributed file storage is widely applied to enterprises, plays an important role in data storage, data backup, data mining, machine learning and the like, and functions of a distributed file storage system are developed more and more along with further improvement of the technology. The distributed file storage system has the basic functions of file storage, providing various interfaces for users to store files on a server and providing storage and backup functions, and the server can conveniently store various files.
Secondly, the wide application of the distributed file storage system is also an important background of the invention, the distributed file storage system is a system based on file reading and writing and file management, and can store files in a server, namely, write the files into a disk of the server, and also can download and view the files from the server, namely, read the files from the disk of the server, and simultaneously, manage file directories of the whole file system.
The distributed file storage systems which are widely applied at present mainly comprise two file storage systems, namely FastDFS developed by C language and Haoop developed by Java language.
The FastDFS system lacks a backup notification mechanism, and once a copy is successfully written to a storage, when synchronizing to other storage backups, a failure of the source storage may result in loss of user data, which is unacceptable for the file storage system. Second, FastDFS lacks an automatic recovery mechanism and data recovery is inefficient.
Hadoop is a product of large data storage, and although Hadoop has the characteristics of high reliability, high expansibility and high fault tolerance, Hadoop architecture causes that Hadoop is not suitable for low-delay data access, secondly, Hadoop adopts a memory management file directory tree, a memory bottleneck exists, and massive small files occupy a large amount of memory space, so massive small files cannot be efficiently stored.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a distributed small file storage system to solve the problem that a large amount of small files cannot be efficiently stored.
In order to solve the technical problems, the invention adopts the technical scheme that: the distributed small file storage system comprises a Master node and a plurality of DataNode nodes which are deployed by adopting a Master-Slave architecture;
the MasterNode node is used for operating and managing a file directory tree and managing a DataNode node; the DataNode node is used for storing file data recorded in the file directory tree; the file directory tree is stored in a redis database cluster, and when the MasterNode node operates the file directory tree, the file directory tree is obtained from the redis database cluster.
The distributed small file storage system also comprises a SecondaryMasterNode node, wherein the SecondaryMasterNode node maintains the same file directory tree as the MasterNode node; the operation of the MasterNode node on the file directory tree can synchronize the SecondardryMasterNode node; after the SecondaryMasterNode synchronously operates the file directory tree, the file directory tree is synchronized to the redis database cluster.
In the distributed small file storage system, the MasterNode node generates editslog files for operating the file directory tree; every T time, the SecondaryMasterNode performs backup operation on the file directory tree, the fsimage file obtained by backup is synchronized to the MasterNode, and the MasterNode clears the editslog file in the time before the fsimage file is generated.
In the distributed small file storage system, the operation of the MasterNode node on the file directory tree includes data adding operation, data deleting operation, data querying operation and/or data modifying operation on the file directory tree.
In the distributed small file storage system, a plurality of DataNode nodes are communicated with each other through a gRPC protocol.
In the distributed small file storage system, each DataNode node sends a heartbeat packet to the MasterNode through the gRPC protocol to report the self state.
The distributed small file storage system also comprises a client side, wherein the client side is used for a user to access the MasterNode node to operate the file directory tree, and upload files to the DataNode node and/or read files from the DataNode node.
In the distributed small file storage system, the client uploads the file to the DataNode node, and the method includes the following steps:
step 1, a client sends a file uploading request to a MasterNode node;
step 2, the MasterNode node inquires a file directory tree, judges whether the ID of the uploaded file is recorded in the file directory tree or not, if so, returns whether the file is written in a covering mode to the client side, and if so, enters the next step; if not, entering the next step;
step 3, the MasterNode inquires the list information of the DataNode and returns the position of the DataNode node which is closest to the network distance and can upload files; the network distance is the communication distance between the client and the DataNode;
and 4, the client establishes a pipeline request with the returned DataNode node, and after the pipeline request is established, the client uploads the file to the DataNode node through the SocketStream by streaming data.
Step 5, when the DataNode node receives the data, writing the data into a file in an IO stream mode, and synchronizing the data to a backup DataNode node in a SocketStream stream mode;
and 6, after the data writing is finished, the DataNode node returns the file writing success to the MasterNode node and the client.
In the distributed small file storage system, the client reads files from the DataNode node, and the method comprises the following steps:
step a, a client sends a file name of a request reading file to a MasterNode node;
b, searching the position of the file by the MasterNode node according to the file name;
step c, the MasterNode node returns the position of the file to the client according to the network distance; the network distance is the communication distance between the client and the DataNode;
and d, the client reads the file from the DataNode according to the returned file position.
Compared with the prior art, the invention has the following advantages: the file directory tree is stored in the redis database cluster, so that the problem of memory bottleneck is solved.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a block diagram of the system of the present invention.
Fig. 2 is a schematic diagram of a file uploading process according to the present invention.
FIG. 3 is a diagram illustrating a process of reading a file according to the present invention.
Detailed Description
As shown in fig. 1, the distributed small file storage system includes a MasterNode node and a plurality of DataNode nodes deployed by a Master-Slave architecture;
the MasterNode node is used for operating and managing a file directory tree and managing a DataNode node; the DataNode node is used for storing file data recorded in the file directory tree; the file directory tree is stored in a redis database cluster, and when the MasterNode node operates the file directory tree, the file directory tree is obtained from the redis database cluster.
It should be noted that the problem of insufficient memory is greatly improved by constructing a redis database cluster as a metadata center to store the file directory tree, and the bottleneck problem of the Hadoop memory is further solved, so that a large amount of small files can be stored.
The distributed small file storage system also comprises a SecondaryMasterNode node which is an assistant node of the MasterNode node, and because the task of the MasterNode node is heavy, a node is needed to help the MasterNode to complete the backup operation of the file directory; the SecondaryMasterNode node maintains the same file directory tree as the MasterNode node; the operation of the MasterNode node on the file directory tree can synchronize the SecondardryMasterNode node; after the SecondaryMasterNode synchronously operates the file directory tree, the file directory tree is synchronized to the redis database cluster.
The distributed small file storage system also comprises a client side, wherein the client side is used for enabling a user to access the MasterNode node to operate the file directory tree, and uploading files to the DataNode node and/or reading files from the DataNode node.
It should be noted that the DFSClient at the client is a module that provides an interface for the user in the system of the present invention, and the user can use the DFSClient module to perform operations such as creation, viewing, deletion, uploading, downloading, and deletion of a directory. Of course, the module may be relied upon by other systems, such that other systems may access the cluster through a particular API to perform operations on the system.
In this embodiment, the operation of the MasterNode node on the file directory tree generates an editslog file; every T time interval, T can be set according to actual requirements; the SecondaryMasterNode node performs backup operation on the file directory tree, the fsimage file obtained by backup is synchronized to the MasterNode node, and the MasterNode node clears the editslog file in the time before the fsimage file is generated.
It should be noted that the MasterNode node is a main core module in the system of the present invention, and the module is mainly used for providing services to the outside, managing a file directory tree, managing an operation log, managing a DataNode node, and the like. All requests from the client end are sent to the MasterNode node first, and after the MasterNode node receives the requests, different responses are carried out according to different request types. In addition, the MasterNode has the most important function of managing operation logs, and all operations of a client on a file system are recorded in the editslog file, so that when the MasterNode fails, the MasterNode can be restarted and played back once from the edisslog file according to the operation logs, a complete file directory tree is obtained, and data cannot be lost. In addition, because edisllog is continuously written, the size of the file is continuously increased, and if no measures are taken, the MasterNode can read the editllog file too large, so that the performance is underground, and the editllog file can be stored by adopting a segmented storage mechanism, so that the system only needs to read a small section of file, and the efficiency is greatly improved. Secondly, because it takes a lot of time to read the editslog data for playback, as the operation log increases, the problem of too long recovery time occurs when the MasterNode is down to recover, based on this problem, the SecondaryMasterNode also provides the function of fsimage directory, and the SecondaryMasterNode writes the file directory tree backup in the redis database into the fsimage file at intervals, synchronizes the fsimage file to the MasterNode, and clears the editslog before this time point, so that when the MasterNode is down to recover, only the file needs to be read from the fsimage, and a part of the operations is played back from the editslog file to obtain the complete file directory. Greatly shortens the time for recovering the MasterNode downtime.
In addition, the operation of the MasterNode node on the file directory tree includes data adding operation, data deleting operation, data querying operation and/or data modifying operation on the file directory tree.
As shown in fig. 1, a plurality of the DataNode nodes are in communication with each other via the gRPC protocol. Each DataNode node sends a heartbeat packet to the MasterNode through the gRPC protocol to report the self state. Communication is maintained through a gRPC protocol, and the high-efficiency low-delay use of the whole cluster is ensured.
As shown in fig. 2, the uploading of the file to the DataNode node by the client includes the following steps:
step 1, a client sends a file uploading request to a MasterNode node;
step 2, the MasterNode node inquires a file directory tree, judges whether the ID of the uploaded file is recorded in the file directory tree or not, if so, returns whether the file is written in a covering mode to the client side, and if so, enters the next step; if not, entering the next step;
step 3, the MasterNode inquires the list information of the DataNode and returns the position of the DataNode node which is closest to the network distance and can upload files; the network distance is the communication distance between the client and the DataNode;
and 4, the client establishes a pipeline request with the returned DataNode node, and after the pipeline request is established, the client uploads the file to the DataNode node through the SocketStream by streaming data.
Step 5, when the DataNode node receives the data, writing the data into a file in an IO stream mode, and synchronizing the data to a backup DataNode node in a SocketStream stream mode;
and 6, after the data writing is finished, the DataNode node returns the file writing success to the MasterNode node and the client.
As shown in fig. 3, the client reads the file from the DataNode node, and includes the following steps:
step a, a client sends a file name of a request reading file to a MasterNode node;
b, searching the position of the file by the MasterNode node according to the file name;
step c, the MasterNode node returns the position of the file to the client according to the network distance; the network distance is the communication distance between the client and the DataNode;
and d, the client reads the file from the DataNode according to the returned file position.
It should be noted that the distributed storage of the massive small file system is met, and the distributed storage can be responded quickly in time. The system has a universal API interface, and can access the operation cluster only by simply introducing DFSClient and simply configuring. The system is designed aiming at the small files, so that the bottleneck problem, the high availability and high expansion problem and the like in small file storage are fully considered, the problem of backup of the file directory of the MasterNode node is creatively solved by introducing the SecondaryMasterNode node, and the high availability and high fault tolerance of the whole cluster are ensured. The distributed file storage system has low requirement on computer hardware, can provide high-efficiency and reliable file storage service by only forming a cluster by a plurality of cheap computer servers, and can be easily expanded. Theoretically, the data can be stored infinitely as long as the DataNode nodes are continuously added.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. Distributed small file storage system, its characterized in that: the system comprises a Master node and a plurality of DataNode nodes which are deployed by adopting a Master-Slave architecture;
the MasterNode node is used for operating and managing a file directory tree and managing a DataNode node; the DataNode node is used for storing file data recorded in the file directory tree; the file directory tree is stored in a redis database cluster, and when the MasterNode node operates the file directory tree, the file directory tree is obtained from the redis database cluster.
2. The distributed doclet storage system of claim 1, wherein: the system also comprises a SecondaryMasterNode node, wherein the SecondaryMasterNode node maintains the same file directory tree as the MasterNode node; the operation of the MasterNode node on the file directory tree can synchronize the SecondardryMasterNode node; after the SecondaryMasterNode synchronously operates the file directory tree, the file directory tree is synchronized to the redis database cluster.
3. The distributed doclet storage system of claim 2, wherein: the MasterNode node generates editslog files by operating the file directory tree; every T time, the SecondaryMasterNode performs backup operation on the file directory tree, the fsimage file obtained by backup is synchronized to the MasterNode, and the MasterNode clears the editslog file in the time before the fsimage file is generated.
4. The distributed doclet storage system of claim 3, wherein: the operation of the MasterNode node on the file directory tree comprises data adding operation, data deleting operation, data inquiring operation and/or data modifying operation on the file directory tree.
5. The distributed doclet storage system of claim 1, 2 or 3, wherein: a plurality of the DataNode nodes are communicated with each other through a gRPC protocol.
6. The distributed doclet storage system of claim 1, 2 or 3, wherein: each DataNode node sends a heartbeat packet to the MasterNode through the gRPC protocol to report the self state.
7. The distributed doclet storage system of claim 1, 2 or 3, wherein: the client is used for a user to access the MasterNode node to operate the file directory tree, and upload files to the DataNode node and/or read files from the DataNode node.
8. The distributed doclet storage system of claim 7, wherein: the method for uploading the file to the DataNode node by the client comprises the following steps:
step 1, a client sends a file uploading request to a MasterNode node;
step 2, the MasterNode node inquires a file directory tree, judges whether the ID of the uploaded file is recorded in the file directory tree or not, if so, returns whether the file is written in a covering mode to the client side, and if so, enters the next step; if not, entering the next step;
step 3, the MasterNode inquires the list information of the DataNode and returns the position of the DataNode node which is closest to the network distance and can upload files; the network distance is the communication distance between the client and the DataNode;
and 4, the client establishes a pipeline request with the returned DataNode node, and after the pipeline request is established, the client uploads the file to the DataNode node through the SocketStream by streaming data.
9. The distributed doclet storage system of claim 8, wherein: further comprising:
step 5, when the DataNode node receives the data, writing the data into a file in an IO stream mode, and synchronizing the data to a backup DataNode node in a SocketStream stream mode;
and 6, after the data writing is finished, the DataNode node returns the file writing success to the MasterNode node and the client.
10. The distributed doclet storage system of claim 7, wherein: the method for reading the file from the DataNode node by the client comprises the following steps:
step a, a client sends a file name of a request reading file to a MasterNode node;
b, searching the position of the file by the MasterNode node according to the file name;
step c, the MasterNode node returns the position of the file to the client according to the network distance; the network distance is the communication distance between the client and the DataNode;
and d, the client reads the file from the DataNode according to the returned file position.
CN202110404012.0A 2021-04-15 2021-04-15 Distributed small file storage system Pending CN113076298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110404012.0A CN113076298A (en) 2021-04-15 2021-04-15 Distributed small file storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110404012.0A CN113076298A (en) 2021-04-15 2021-04-15 Distributed small file storage system

Publications (1)

Publication Number Publication Date
CN113076298A true CN113076298A (en) 2021-07-06

Family

ID=76617776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110404012.0A Pending CN113076298A (en) 2021-04-15 2021-04-15 Distributed small file storage system

Country Status (1)

Country Link
CN (1) CN113076298A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499426A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Method, device, equipment and medium for transmitting mass small files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966293B1 (en) * 2004-03-09 2011-06-21 Netapp, Inc. System and method for indexing a backup using persistent consistency point images
CN103853612A (en) * 2012-12-04 2014-06-11 中山大学深圳研究院 Method for reading data based on digital family content under distributed storage
CN111399760A (en) * 2019-11-19 2020-07-10 杭州海康威视系统技术有限公司 NAS cluster metadata processing method and device, NAS gateway and medium
CN111427841A (en) * 2020-02-26 2020-07-17 平安科技(深圳)有限公司 Data management method and device, computer equipment and storage medium
CN112416889A (en) * 2020-10-27 2021-02-26 中科曙光南京研究院有限公司 Distributed storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966293B1 (en) * 2004-03-09 2011-06-21 Netapp, Inc. System and method for indexing a backup using persistent consistency point images
CN103853612A (en) * 2012-12-04 2014-06-11 中山大学深圳研究院 Method for reading data based on digital family content under distributed storage
CN111399760A (en) * 2019-11-19 2020-07-10 杭州海康威视系统技术有限公司 NAS cluster metadata processing method and device, NAS gateway and medium
CN111427841A (en) * 2020-02-26 2020-07-17 平安科技(深圳)有限公司 Data management method and device, computer equipment and storage medium
CN112416889A (en) * 2020-10-27 2021-02-26 中科曙光南京研究院有限公司 Distributed storage system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499426A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Method, device, equipment and medium for transmitting mass small files

Similar Documents

Publication Publication Date Title
CN111723160B (en) Multi-source heterogeneous incremental data synchronization method and system
JP7271670B2 (en) Data replication method, device, computer equipment and computer program
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
US11468015B2 (en) Storage and synchronization of metadata in a distributed storage system
CN101809558B (en) System and method for remote asynchronous data replication
US7653668B1 (en) Fault tolerant multi-stage data replication with relaxed coherency guarantees
US6823474B2 (en) Method and system for providing cluster replicated checkpoint services
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
US20190163765A1 (en) Continuous data management system and operating method thereof
US20070143286A1 (en) File management method in file system and metadata server therefor
CN103138912B (en) Method of data synchronization and system
CN111078667B (en) Data migration method and related device
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
CN101808127A (en) Data backup method, system and server
CN107818111B (en) Method for caching file data, server and terminal
CN103902405A (en) Quasi-continuity data replication method and device
CN106873902B (en) File storage system, data scheduling method and data node
CN113010496A (en) Data migration method, device, equipment and storage medium
CN113076298A (en) Distributed small file storage system
CN115563221A (en) Data synchronization method, storage system, device and storage medium
CN111143366B (en) High-efficiency storage method for massive large object data
WO2021208401A1 (en) Continuous data protection system and method for modern applications
CN111522688B (en) Data backup method and device for distributed system
CN108874592B (en) Data cold standby method and system for Log-structured storage engine
CN112667698A (en) MongoDB data synchronization method based on converged media platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination