CN103516549A - File system metadata log mechanism based on shared object storage - Google Patents

File system metadata log mechanism based on shared object storage Download PDF

Info

Publication number
CN103516549A
CN103516549A CN201310447799.4A CN201310447799A CN103516549A CN 103516549 A CN103516549 A CN 103516549A CN 201310447799 A CN201310447799 A CN 201310447799A CN 103516549 A CN103516549 A CN 103516549A
Authority
CN
China
Prior art keywords
metadata
access device
metadata log
log
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310447799.4A
Other languages
Chinese (zh)
Other versions
CN103516549B (en
Inventor
袁冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201310447799.4A priority Critical patent/CN103516549B/en
Publication of CN103516549A publication Critical patent/CN103516549A/en
Application granted granted Critical
Publication of CN103516549B publication Critical patent/CN103516549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a file system metadata log mechanism based on shared object storage, and belongs to the field of computer storage. Metadata logs in a file system are uniformly stored in an object storage system, and the mechanism mainly comprises the following modules, that is, an object storage accessor, a metadata log accessor and a metadata log manager. Storage of ultra long metadata logs can be achieved, cycle control is not needed, and complexity of the system is reduced.

Description

A kind of file system metadata log mechanism based on shared object storage
 
Technical field
The present invention relates to Computer Storage field, specifically a kind of file system metadata log mechanism based on shared object storage.
Background technology
Along with developing rapidly of network application, network information data amount is increasing, and other mass data storage of PB level becomes more and more important.The file system of traditional sense can not meet the requirements such as the large capacity, high reliability, high-performance of existing application, and for meeting these new demands, distributed file system application has obtained extensive attention.
The research of existing distributed file system is mainly divided into metadata and actual data storage separate management, in file system, metadata request occupies the more than 50% of all requests, therefore, metadata management problem becomes an important research direction in distributed file system research.
The main a large amount of random IO operation of metadata operation pattern, and current main memory device---mechanical hard disk, its random IO operation is with respect to order IO, and there is larger gap in performance.This is main because mechanical disk, when processing random IO, carry out a large amount of seek operation, and seek operation is mechanically actuated operation, consuming time longer with respect to electronic operation.For this situation, metadata log mechanism can provide the performance of metadata largely.
The main thought of metadata log mechanism is that random write operation is converted to sequential write operation.In this mechanism, the renewal process of metadata is divided into three steps: (1) writes metadata daily record by metadata operation with the form order of journal entries; (2) upgrade metadata cache; (3) by the asynchronous data area that is updated to metadata of the dirty data in buffer memory.In this process, after second process completes, can be complete to clients report metadata request.And the 3rd step can be in due course, for example, when system pressure is lighter asynchronous execution.Because first step is the order ablation process of data, therefore, the method with respect to directly more new metadata there is response speed faster, can improve significantly the efficiency of metadata operation.
But local disk or file are used in traditional metadata daily record conventionally, therefore there is following problem: 1) size of metadata daily record is subject to the restriction of local disk space.In cluster storage, with respect to the free space of whole system, the space that local disk can provide is very limited.Therefore conventionally need to adopt loop control mechanism, scale of Web logs is controlled in certain scope; 2) local disk does not possess disaster tolerance ability.If need to support the disaster tolerance of local disk, need the disaster tolerance equipment such as RAID that provide extra, and these equipment scarcely possesses the disaster tolerance ability of cross-node.
Current distributed memory system is transitioned into object storage protocol gradually, and whole data cluster is usingd object as the organizational form of data, so be badly in need of a kind of new metadata log management mode.
Summary of the invention
The invention provides a kind of file system metadata log mechanism based on shared object storage, metadata daily record unification in file system is kept in an object storage system, and this mechanism mainly comprises with lower module: object memory access device, metadata log access device, metadata log manager;
Object memory access device, this module is responsible for realizing the access of object storage system, comprises the support of the operations such as reading and writing to object, deletion;
Metadata log access device, this module, on the basis of object memory access device, is packaged as journalizing by Object Operations, and a virtual journal file is provided, and this journal file does not have length restriction, and supports to append and read, append and write and break-in operation;
Metadata log manager, this module is in charge of metadata daily record, is included in the playback of execution journal in system startup and recovery process, the break-in operation of execution journal in system running;
Perform step as follows:
(1) structure metadata journal entries,
(2) submit to metadata entry to metadata log manager,
(3) metadata log manager submits to journal entries to metadata log access device,
(4) metadata log access device is written to daily record in cluster storage by object memory access device,
(5) object memory access device completes after write operation, to the report of metadata log access device, write,
(6) metadata log access device has write to the report of metadata log manager,
(7) metadata log manager has write to meta data server report,
(8) meta data server upgrades memory cache.
(9) meta data server completes to clients report metadata request.
Described log mechanism, its metadata log manager clocked flip daily record block flow process.
Described log mechanism, it is delayed after machine when pivot data server, and monitoring module notice backup meta data server is taken over service.
Can support the hot standby pattern of meta data server cluster mode and meta data server.In meta data server cluster mode, the daily record group of objects that different meta data servers are corresponding different according to its server ID; In the hot standby pattern of meta data server, active/standby server is shared same daily record group of objects, and under normal condition, master server has the write permission of daily record group of objects, and during active and standby switching, backup server is obtained the write permission of daily record group of objects.
The invention has the beneficial effects as follows: can realize the storage of overlength metadata daily record, the size of daily record is only subject to the capacity limit of object storage system.On the other hand, metadata daily record can continue to use appends the pattern writing, and does not need loop control, has reduced the complexity of system, supports meta data server cluster mode and the hot standby pattern of meta data server.
Accompanying drawing explanation
Accompanying drawing 1 is the system architecture diagram of the file system metadata log mechanism based on shared object storage.
Embodiment
With reference to the accompanying drawings, content of the present invention is described to its implementation and the course of work with instantiation.
One, metadata request flow process
When meta data server is received the metadata request of client, can carry out the operation that following steps complete request.
(1) structure metadata journal entries.Meta data server is according to type, the data item of request and the journal entries data corresponding to the information structurings such as operation that will carry out of request.
(2) submit to metadata entry to metadata log manager.Meta data server calls the submission log approach of metadata log manager, and metadata journal entries is submitted to metadata log manager.The validity of metadata log manager audit log entry, is numbered journal entries.
(3) metadata log manager submits to journal entries to metadata log access device.Metadata log manager completes after a series of verification operations, and journal entries is submitted to metadata log access device.
(4) metadata log access device is written to daily record in cluster storage by object memory access device.Metadata log access device is received after daily record write request, according to the writing position of current daily record, calculates the start offset amount of new entry, then journalizing is converted into Object Operations, and carries out corresponding Object Operations by object memory access device.
(5) object memory access device completes after write operation, to the report of metadata log access device, has write.Object memory access device is responsible for carrying out Object Operations, communicates by letter with object storage device, writes data in object storage cluster.This operation is asynchronous operation, after operation completes, can notify metadata log access device to write by callback method.
(6) metadata log access device has write to the report of metadata log manager.Metadata log access device receives that the Object Operations of object memory access device completes after notice, can upgrade the correlation circumstance of current whole MDS daily record, comprises side-play amount that next entry will write etc.Complete after renewal, to the report of metadata log manager, write.
(7) metadata log manager has write to meta data server report.Having write after message of metadata log access device report received in the report of metadata log manager, to meta data server report, writes successfully.
(8) meta data server upgrades memory cache.When meta data server, receive after the report that daily record write, understand metadata updates operation and completed persistence, upgrade memory cache, so that apply metadata operation.
(7) meta data server completes to clients report metadata request.Complete after buffer update, meta data server completes to clients report metadata request.
Two, daily record is blocked
The daily record of metadata log manager clocked flip block flow process.
Before execution phase operation, can first judge the pressure of current system, comprise the information such as CPU, network, request waiting list length, judge current execution journal break-in operation whether time.If metadata log manager finds that current system pressure is larger, supspend the stage flow process of daily record.
If current system load allows execution journal break-in operation, metadata log manager starts the dirty data in buffer memory to flush to the data field of metadata.After refresh operation runs succeeded, notice metadata log access device blocks the metadata entry that dumps to data device.
Three, disaster tolerance
When pivot data server is delayed after machine, monitoring module notice backup meta data server is taken over service.
Backup meta data server reads the metadata daily record of master server by log access device, and the playback operation of execution journal, thus the buffer status of backup server is set to and delays unanimously before machine with master server, then continue processing client-requested.

Claims (3)

1. the file system metadata log mechanism based on shared object storage, it is characterized in that the metadata daily record unification in file system to be kept in an object storage system, this mechanism mainly comprises with lower module: object memory access device, metadata log access device, metadata log manager;
Object memory access device, this module is responsible for realizing the access of object storage system, comprises the support of the operations such as reading and writing to object, deletion;
Metadata log access device, this module, on the basis of object memory access device, is packaged as journalizing by Object Operations, and a virtual journal file is provided, and this journal file does not have length restriction, and supports to append and read, append and write and break-in operation;
Metadata log manager, this module is in charge of metadata daily record, is included in the playback of execution journal in system startup and recovery process, the break-in operation of execution journal in system running;
Perform step as follows:
(1) structure metadata journal entries,
(2) submit to metadata entry to metadata log manager,
(3) metadata log manager submits to journal entries to metadata log access device,
(4) metadata log access device is written to daily record in cluster storage by object memory access device,
(5) object memory access device completes after write operation, to the report of metadata log access device, write,
(6) metadata log access device has write to the report of metadata log manager,
(7) metadata log manager has write to meta data server report,
(8) meta data server upgrades memory cache;
(9) meta data server completes to clients report metadata request.
2. according to the log mechanism described in claims 1, it is characterized in that the flow process of blocking of metadata log manager clocked flip daily record.
3. according to the log mechanism described in claims 2, it is characterized in that delaying after machine when pivot data server, monitoring module notice backup meta data server is taken over service.
CN201310447799.4A 2013-09-27 2013-09-27 A kind of file system metadata log mechanism based on shared object storage Active CN103516549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310447799.4A CN103516549B (en) 2013-09-27 2013-09-27 A kind of file system metadata log mechanism based on shared object storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310447799.4A CN103516549B (en) 2013-09-27 2013-09-27 A kind of file system metadata log mechanism based on shared object storage

Publications (2)

Publication Number Publication Date
CN103516549A true CN103516549A (en) 2014-01-15
CN103516549B CN103516549B (en) 2018-03-27

Family

ID=49898626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310447799.4A Active CN103516549B (en) 2013-09-27 2013-09-27 A kind of file system metadata log mechanism based on shared object storage

Country Status (1)

Country Link
CN (1) CN103516549B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793517A (en) * 2014-02-12 2014-05-14 浪潮电子信息产业股份有限公司 File system log dump dynamic capacity-increase method based on monitoring mechanism
CN103902479A (en) * 2014-03-27 2014-07-02 浪潮电子信息产业股份有限公司 Quick reconstruction mechanism for metadata cache on basis of metadata log
CN104991739A (en) * 2015-06-19 2015-10-21 中国科学院计算技术研究所 Method and system for refining primary execution semantics during metadata server failure substitution
CN106790563A (en) * 2016-12-27 2017-05-31 浙江省公众信息产业有限公司 Distributed memory system and method
US10089338B2 (en) 2014-12-12 2018-10-02 International Business Machines Corporation Method and apparatus for object storage
CN109144413A (en) * 2018-07-27 2019-01-04 郑州云海信息技术有限公司 A kind of metadata management method and device
CN109828862A (en) * 2017-11-23 2019-05-31 成都华为技术有限公司 A kind of method and apparatus playing back log
CN110019063A (en) * 2017-08-15 2019-07-16 厦门雅迅网络股份有限公司 Method, terminal device and the storage medium of calculate node data disaster tolerance playback

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581091A (en) * 2003-08-11 2005-02-16 株式会社日立制作所 Multi-site remote-copy system
US20090112789A1 (en) * 2007-10-31 2009-04-30 Fernando Oliveira Policy based file management
CN103049351A (en) * 2012-12-13 2013-04-17 曙光信息产业(北京)有限公司 Log processing method and device of multivariate data server
CN103207883A (en) * 2012-01-12 2013-07-17 Lsi公司 Method For Metadata Persistence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581091A (en) * 2003-08-11 2005-02-16 株式会社日立制作所 Multi-site remote-copy system
US20090112789A1 (en) * 2007-10-31 2009-04-30 Fernando Oliveira Policy based file management
CN103207883A (en) * 2012-01-12 2013-07-17 Lsi公司 Method For Metadata Persistence
CN103049351A (en) * 2012-12-13 2013-04-17 曙光信息产业(北京)有限公司 Log processing method and device of multivariate data server

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793517A (en) * 2014-02-12 2014-05-14 浪潮电子信息产业股份有限公司 File system log dump dynamic capacity-increase method based on monitoring mechanism
CN103902479A (en) * 2014-03-27 2014-07-02 浪潮电子信息产业股份有限公司 Quick reconstruction mechanism for metadata cache on basis of metadata log
US10089338B2 (en) 2014-12-12 2018-10-02 International Business Machines Corporation Method and apparatus for object storage
CN104991739A (en) * 2015-06-19 2015-10-21 中国科学院计算技术研究所 Method and system for refining primary execution semantics during metadata server failure substitution
CN104991739B (en) * 2015-06-19 2018-05-01 中国科学院计算技术研究所 Meta data server failure accurate method and system for once performing semanteme in taking over
CN106790563A (en) * 2016-12-27 2017-05-31 浙江省公众信息产业有限公司 Distributed memory system and method
CN106790563B (en) * 2016-12-27 2019-11-15 浙江省公众信息产业有限公司 Distributed memory system and method
CN110019063A (en) * 2017-08-15 2019-07-16 厦门雅迅网络股份有限公司 Method, terminal device and the storage medium of calculate node data disaster tolerance playback
CN110019063B (en) * 2017-08-15 2022-07-05 厦门雅迅网络股份有限公司 Method for computing node data disaster recovery playback, terminal device and storage medium
CN109828862A (en) * 2017-11-23 2019-05-31 成都华为技术有限公司 A kind of method and apparatus playing back log
CN109828862B (en) * 2017-11-23 2023-08-22 成都华为技术有限公司 Method and device for replaying log
CN109144413A (en) * 2018-07-27 2019-01-04 郑州云海信息技术有限公司 A kind of metadata management method and device

Also Published As

Publication number Publication date
CN103516549B (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN103516549B (en) A kind of file system metadata log mechanism based on shared object storage
US8689047B2 (en) Virtual disk replication using log files
KR101833114B1 (en) Fast crash recovery for distributed database systems
US8868858B2 (en) Method and apparatus of continuous data backup and access using virtual machines
KR101827239B1 (en) System-wide checkpoint avoidance for distributed database systems
US10083093B1 (en) Consistent replication in a geographically disperse active environment
US8407182B1 (en) Systems and methods for facilitating long-distance live migrations of virtual machines
EP3206128B1 (en) Data storage method, data storage apparatus, and storage device
US9767015B1 (en) Enhanced operating system integrity using non-volatile system memory
WO2019001521A1 (en) Data storage method, storage device, client and system
US9760480B1 (en) Enhanced logging using non-volatile system memory
US20170371778A1 (en) Reliable Distributed Messaging Using Non-Volatile System Memory
CN103037004A (en) Implement method and device of cloud storage system operation
CN107832423B (en) File reading and writing method for distributed file system
US11614879B2 (en) Technique for replicating oplog index among nodes of a cluster
CN110704431A (en) Hierarchical storage management method for mass data
US10089220B1 (en) Saving state information resulting from non-idempotent operations in non-volatile system memory
US10210013B1 (en) Systems and methods for making snapshots available
CN110134551B (en) Continuous data protection method and device
US11226875B2 (en) System halt event recovery
CN105871987A (en) High available system and method for data writing
WO2022033269A1 (en) Data processing method, device and system
US11263237B2 (en) Systems and methods for storage block replication in a hybrid storage environment
US20070061530A1 (en) Method for storage of digital data in a mainframe data center and associated device
US20190050455A1 (en) Adaptive page rendering for a data management system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant