CN103516549A - File system metadata log mechanism based on shared object storage - Google Patents
File system metadata log mechanism based on shared object storage Download PDFInfo
- Publication number
- CN103516549A CN103516549A CN201310447799.4A CN201310447799A CN103516549A CN 103516549 A CN103516549 A CN 103516549A CN 201310447799 A CN201310447799 A CN 201310447799A CN 103516549 A CN103516549 A CN 103516549A
- Authority
- CN
- China
- Prior art keywords
- metadata
- access device
- metadata log
- log
- write
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a file system metadata log mechanism based on shared object storage, and belongs to the field of computer storage. Metadata logs in a file system are uniformly stored in an object storage system, and the mechanism mainly comprises the following modules, that is, an object storage accessor, a metadata log accessor and a metadata log manager. Storage of ultra long metadata logs can be achieved, cycle control is not needed, and complexity of the system is reduced.
Description
Technical field
The present invention relates to Computer Storage field, specifically a kind of file system metadata log mechanism based on shared object storage.
Background technology
Along with developing rapidly of network application, network information data amount is increasing, and other mass data storage of PB level becomes more and more important.The file system of traditional sense can not meet the requirements such as the large capacity, high reliability, high-performance of existing application, and for meeting these new demands, distributed file system application has obtained extensive attention.
The research of existing distributed file system is mainly divided into metadata and actual data storage separate management, in file system, metadata request occupies the more than 50% of all requests, therefore, metadata management problem becomes an important research direction in distributed file system research.
The main a large amount of random IO operation of metadata operation pattern, and current main memory device---mechanical hard disk, its random IO operation is with respect to order IO, and there is larger gap in performance.This is main because mechanical disk, when processing random IO, carry out a large amount of seek operation, and seek operation is mechanically actuated operation, consuming time longer with respect to electronic operation.For this situation, metadata log mechanism can provide the performance of metadata largely.
The main thought of metadata log mechanism is that random write operation is converted to sequential write operation.In this mechanism, the renewal process of metadata is divided into three steps: (1) writes metadata daily record by metadata operation with the form order of journal entries; (2) upgrade metadata cache; (3) by the asynchronous data area that is updated to metadata of the dirty data in buffer memory.In this process, after second process completes, can be complete to clients report metadata request.And the 3rd step can be in due course, for example, when system pressure is lighter asynchronous execution.Because first step is the order ablation process of data, therefore, the method with respect to directly more new metadata there is response speed faster, can improve significantly the efficiency of metadata operation.
But local disk or file are used in traditional metadata daily record conventionally, therefore there is following problem: 1) size of metadata daily record is subject to the restriction of local disk space.In cluster storage, with respect to the free space of whole system, the space that local disk can provide is very limited.Therefore conventionally need to adopt loop control mechanism, scale of Web logs is controlled in certain scope; 2) local disk does not possess disaster tolerance ability.If need to support the disaster tolerance of local disk, need the disaster tolerance equipment such as RAID that provide extra, and these equipment scarcely possesses the disaster tolerance ability of cross-node.
Current distributed memory system is transitioned into object storage protocol gradually, and whole data cluster is usingd object as the organizational form of data, so be badly in need of a kind of new metadata log management mode.
Summary of the invention
The invention provides a kind of file system metadata log mechanism based on shared object storage, metadata daily record unification in file system is kept in an object storage system, and this mechanism mainly comprises with lower module: object memory access device, metadata log access device, metadata log manager;
Object memory access device, this module is responsible for realizing the access of object storage system, comprises the support of the operations such as reading and writing to object, deletion;
Metadata log access device, this module, on the basis of object memory access device, is packaged as journalizing by Object Operations, and a virtual journal file is provided, and this journal file does not have length restriction, and supports to append and read, append and write and break-in operation;
Metadata log manager, this module is in charge of metadata daily record, is included in the playback of execution journal in system startup and recovery process, the break-in operation of execution journal in system running;
Perform step as follows:
(1) structure metadata journal entries,
(2) submit to metadata entry to metadata log manager,
(3) metadata log manager submits to journal entries to metadata log access device,
(4) metadata log access device is written to daily record in cluster storage by object memory access device,
(5) object memory access device completes after write operation, to the report of metadata log access device, write,
(6) metadata log access device has write to the report of metadata log manager,
(7) metadata log manager has write to meta data server report,
(8) meta data server upgrades memory cache.
(9) meta data server completes to clients report metadata request.
Described log mechanism, its metadata log manager clocked flip daily record block flow process.
Described log mechanism, it is delayed after machine when pivot data server, and monitoring module notice backup meta data server is taken over service.
Can support the hot standby pattern of meta data server cluster mode and meta data server.In meta data server cluster mode, the daily record group of objects that different meta data servers are corresponding different according to its server ID; In the hot standby pattern of meta data server, active/standby server is shared same daily record group of objects, and under normal condition, master server has the write permission of daily record group of objects, and during active and standby switching, backup server is obtained the write permission of daily record group of objects.
The invention has the beneficial effects as follows: can realize the storage of overlength metadata daily record, the size of daily record is only subject to the capacity limit of object storage system.On the other hand, metadata daily record can continue to use appends the pattern writing, and does not need loop control, has reduced the complexity of system, supports meta data server cluster mode and the hot standby pattern of meta data server.
Accompanying drawing explanation
Accompanying drawing 1 is the system architecture diagram of the file system metadata log mechanism based on shared object storage.
Embodiment
With reference to the accompanying drawings, content of the present invention is described to its implementation and the course of work with instantiation.
One, metadata request flow process
When meta data server is received the metadata request of client, can carry out the operation that following steps complete request.
(1) structure metadata journal entries.Meta data server is according to type, the data item of request and the journal entries data corresponding to the information structurings such as operation that will carry out of request.
(2) submit to metadata entry to metadata log manager.Meta data server calls the submission log approach of metadata log manager, and metadata journal entries is submitted to metadata log manager.The validity of metadata log manager audit log entry, is numbered journal entries.
(3) metadata log manager submits to journal entries to metadata log access device.Metadata log manager completes after a series of verification operations, and journal entries is submitted to metadata log access device.
(4) metadata log access device is written to daily record in cluster storage by object memory access device.Metadata log access device is received after daily record write request, according to the writing position of current daily record, calculates the start offset amount of new entry, then journalizing is converted into Object Operations, and carries out corresponding Object Operations by object memory access device.
(5) object memory access device completes after write operation, to the report of metadata log access device, has write.Object memory access device is responsible for carrying out Object Operations, communicates by letter with object storage device, writes data in object storage cluster.This operation is asynchronous operation, after operation completes, can notify metadata log access device to write by callback method.
(6) metadata log access device has write to the report of metadata log manager.Metadata log access device receives that the Object Operations of object memory access device completes after notice, can upgrade the correlation circumstance of current whole MDS daily record, comprises side-play amount that next entry will write etc.Complete after renewal, to the report of metadata log manager, write.
(7) metadata log manager has write to meta data server report.Having write after message of metadata log access device report received in the report of metadata log manager, to meta data server report, writes successfully.
(8) meta data server upgrades memory cache.When meta data server, receive after the report that daily record write, understand metadata updates operation and completed persistence, upgrade memory cache, so that apply metadata operation.
(7) meta data server completes to clients report metadata request.Complete after buffer update, meta data server completes to clients report metadata request.
Two, daily record is blocked
The daily record of metadata log manager clocked flip block flow process.
Before execution phase operation, can first judge the pressure of current system, comprise the information such as CPU, network, request waiting list length, judge current execution journal break-in operation whether time.If metadata log manager finds that current system pressure is larger, supspend the stage flow process of daily record.
If current system load allows execution journal break-in operation, metadata log manager starts the dirty data in buffer memory to flush to the data field of metadata.After refresh operation runs succeeded, notice metadata log access device blocks the metadata entry that dumps to data device.
Three, disaster tolerance
When pivot data server is delayed after machine, monitoring module notice backup meta data server is taken over service.
Backup meta data server reads the metadata daily record of master server by log access device, and the playback operation of execution journal, thus the buffer status of backup server is set to and delays unanimously before machine with master server, then continue processing client-requested.
Claims (3)
1. the file system metadata log mechanism based on shared object storage, it is characterized in that the metadata daily record unification in file system to be kept in an object storage system, this mechanism mainly comprises with lower module: object memory access device, metadata log access device, metadata log manager;
Object memory access device, this module is responsible for realizing the access of object storage system, comprises the support of the operations such as reading and writing to object, deletion;
Metadata log access device, this module, on the basis of object memory access device, is packaged as journalizing by Object Operations, and a virtual journal file is provided, and this journal file does not have length restriction, and supports to append and read, append and write and break-in operation;
Metadata log manager, this module is in charge of metadata daily record, is included in the playback of execution journal in system startup and recovery process, the break-in operation of execution journal in system running;
Perform step as follows:
(1) structure metadata journal entries,
(2) submit to metadata entry to metadata log manager,
(3) metadata log manager submits to journal entries to metadata log access device,
(4) metadata log access device is written to daily record in cluster storage by object memory access device,
(5) object memory access device completes after write operation, to the report of metadata log access device, write,
(6) metadata log access device has write to the report of metadata log manager,
(7) metadata log manager has write to meta data server report,
(8) meta data server upgrades memory cache;
(9) meta data server completes to clients report metadata request.
2. according to the log mechanism described in claims 1, it is characterized in that the flow process of blocking of metadata log manager clocked flip daily record.
3. according to the log mechanism described in claims 2, it is characterized in that delaying after machine when pivot data server, monitoring module notice backup meta data server is taken over service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310447799.4A CN103516549B (en) | 2013-09-27 | 2013-09-27 | A kind of file system metadata log mechanism based on shared object storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310447799.4A CN103516549B (en) | 2013-09-27 | 2013-09-27 | A kind of file system metadata log mechanism based on shared object storage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103516549A true CN103516549A (en) | 2014-01-15 |
CN103516549B CN103516549B (en) | 2018-03-27 |
Family
ID=49898626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310447799.4A Active CN103516549B (en) | 2013-09-27 | 2013-09-27 | A kind of file system metadata log mechanism based on shared object storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103516549B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793517A (en) * | 2014-02-12 | 2014-05-14 | 浪潮电子信息产业股份有限公司 | File system log dump dynamic capacity-increase method based on monitoring mechanism |
CN103902479A (en) * | 2014-03-27 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Quick reconstruction mechanism for metadata cache on basis of metadata log |
CN104991739A (en) * | 2015-06-19 | 2015-10-21 | 中国科学院计算技术研究所 | Method and system for refining primary execution semantics during metadata server failure substitution |
CN106790563A (en) * | 2016-12-27 | 2017-05-31 | 浙江省公众信息产业有限公司 | Distributed memory system and method |
US10089338B2 (en) | 2014-12-12 | 2018-10-02 | International Business Machines Corporation | Method and apparatus for object storage |
CN109144413A (en) * | 2018-07-27 | 2019-01-04 | 郑州云海信息技术有限公司 | A kind of metadata management method and device |
CN109828862A (en) * | 2017-11-23 | 2019-05-31 | 成都华为技术有限公司 | A kind of method and apparatus playing back log |
CN110019063A (en) * | 2017-08-15 | 2019-07-16 | 厦门雅迅网络股份有限公司 | Method, terminal device and the storage medium of calculate node data disaster tolerance playback |
CN117891409A (en) * | 2024-03-13 | 2024-04-16 | 济南浪潮数据技术有限公司 | Data management method, device, equipment and storage medium for distributed storage system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581091A (en) * | 2003-08-11 | 2005-02-16 | 株式会社日立制作所 | Multi-site remote-copy system |
US20090112789A1 (en) * | 2007-10-31 | 2009-04-30 | Fernando Oliveira | Policy based file management |
CN103049351A (en) * | 2012-12-13 | 2013-04-17 | 曙光信息产业(北京)有限公司 | Log processing method and device of multivariate data server |
CN103207883A (en) * | 2012-01-12 | 2013-07-17 | Lsi公司 | Method For Metadata Persistence |
-
2013
- 2013-09-27 CN CN201310447799.4A patent/CN103516549B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1581091A (en) * | 2003-08-11 | 2005-02-16 | 株式会社日立制作所 | Multi-site remote-copy system |
US20090112789A1 (en) * | 2007-10-31 | 2009-04-30 | Fernando Oliveira | Policy based file management |
CN103207883A (en) * | 2012-01-12 | 2013-07-17 | Lsi公司 | Method For Metadata Persistence |
CN103049351A (en) * | 2012-12-13 | 2013-04-17 | 曙光信息产业(北京)有限公司 | Log processing method and device of multivariate data server |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793517A (en) * | 2014-02-12 | 2014-05-14 | 浪潮电子信息产业股份有限公司 | File system log dump dynamic capacity-increase method based on monitoring mechanism |
CN103902479A (en) * | 2014-03-27 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Quick reconstruction mechanism for metadata cache on basis of metadata log |
US10089338B2 (en) | 2014-12-12 | 2018-10-02 | International Business Machines Corporation | Method and apparatus for object storage |
CN104991739A (en) * | 2015-06-19 | 2015-10-21 | 中国科学院计算技术研究所 | Method and system for refining primary execution semantics during metadata server failure substitution |
CN104991739B (en) * | 2015-06-19 | 2018-05-01 | 中国科学院计算技术研究所 | Meta data server failure accurate method and system for once performing semanteme in taking over |
CN106790563B (en) * | 2016-12-27 | 2019-11-15 | 浙江省公众信息产业有限公司 | Distributed memory system and method |
CN106790563A (en) * | 2016-12-27 | 2017-05-31 | 浙江省公众信息产业有限公司 | Distributed memory system and method |
CN110019063A (en) * | 2017-08-15 | 2019-07-16 | 厦门雅迅网络股份有限公司 | Method, terminal device and the storage medium of calculate node data disaster tolerance playback |
CN110019063B (en) * | 2017-08-15 | 2022-07-05 | 厦门雅迅网络股份有限公司 | Method for computing node data disaster recovery playback, terminal device and storage medium |
CN109828862A (en) * | 2017-11-23 | 2019-05-31 | 成都华为技术有限公司 | A kind of method and apparatus playing back log |
CN109828862B (en) * | 2017-11-23 | 2023-08-22 | 成都华为技术有限公司 | Method and device for replaying log |
CN109144413A (en) * | 2018-07-27 | 2019-01-04 | 郑州云海信息技术有限公司 | A kind of metadata management method and device |
CN117891409A (en) * | 2024-03-13 | 2024-04-16 | 济南浪潮数据技术有限公司 | Data management method, device, equipment and storage medium for distributed storage system |
Also Published As
Publication number | Publication date |
---|---|
CN103516549B (en) | 2018-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103516549B (en) | A kind of file system metadata log mechanism based on shared object storage | |
US8689047B2 (en) | Virtual disk replication using log files | |
KR101833114B1 (en) | Fast crash recovery for distributed database systems | |
US8868858B2 (en) | Method and apparatus of continuous data backup and access using virtual machines | |
US10083093B1 (en) | Consistent replication in a geographically disperse active environment | |
EP3206128B1 (en) | Data storage method, data storage apparatus, and storage device | |
KR20150129839A (en) | System-wide checkpoint avoidance for distributed database systems | |
US9767015B1 (en) | Enhanced operating system integrity using non-volatile system memory | |
CN107832423B (en) | File reading and writing method for distributed file system | |
WO2019001521A1 (en) | Data storage method, storage device, client and system | |
US9760480B1 (en) | Enhanced logging using non-volatile system memory | |
CN103037004A (en) | Implement method and device of cloud storage system operation | |
US11614879B2 (en) | Technique for replicating oplog index among nodes of a cluster | |
CN110134551B (en) | Continuous data protection method and device | |
WO2022033269A1 (en) | Data processing method, device and system | |
CN110704431A (en) | Hierarchical storage management method for mass data | |
US10089220B1 (en) | Saving state information resulting from non-idempotent operations in non-volatile system memory | |
US20190050455A1 (en) | Adaptive page rendering for a data management system | |
US10210013B1 (en) | Systems and methods for making snapshots available | |
US11226875B2 (en) | System halt event recovery | |
CN105871987A (en) | High available system and method for data writing | |
US11263237B2 (en) | Systems and methods for storage block replication in a hybrid storage environment | |
US20070061530A1 (en) | Method for storage of digital data in a mainframe data center and associated device | |
US7987335B1 (en) | Techniques for virtualizing data | |
US20180316758A1 (en) | Method and apparatus for logical mirroring to a multi-tier target node |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |