WO2015184925A1 - Data processing method for distributed file system and distributed file system - Google Patents

Data processing method for distributed file system and distributed file system Download PDF

Info

Publication number
WO2015184925A1
WO2015184925A1 PCT/CN2015/076473 CN2015076473W WO2015184925A1 WO 2015184925 A1 WO2015184925 A1 WO 2015184925A1 CN 2015076473 W CN2015076473 W CN 2015076473W WO 2015184925 A1 WO2015184925 A1 WO 2015184925A1
Authority
WO
WIPO (PCT)
Prior art keywords
fas
flr
data
file
fac
Prior art date
Application number
PCT/CN2015/076473
Other languages
French (fr)
Chinese (zh)
Inventor
朱鹏
林健
胡剑华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015184925A1 publication Critical patent/WO2015184925A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • This paper relates to the field of distributed file storage technology, and in particular relates to a data processing method and a distributed file system of a distributed file system.
  • the distributed file system can provide high throughput rate, can provide several times the throughput rate of the common local file system, and provide high reliability. Through multiple copies and redundant copy technology, the reliability of data in the case of abnormal single machine can be improved. For devices such as magnetic arrays, there are advantages of being inexpensive and versatile.
  • This paper provides a data processing method and distributed file system for distributed file system to avoid data inconsistency between multiple copies caused by FAS downtime.
  • a data processing method for a distributed file system comprising:
  • the FAC obtains the file data and pushes it to the FAS;
  • the FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and returns a file data push completion message to the FAC;
  • the FAC After receiving the file data push completion message returned by the FAS, the FAC sends a message to the FLR. Send metadata modification request;
  • the FLR modifies the change request according to the metadata, modifies the corresponding metadata, and records the file to the log file system;
  • the FLR When the abnormal restart of the FAS is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • the step of the FLR modifying the change request according to the metadata, modifying the corresponding metadata, and recording to the log file system further includes:
  • the FLR adds the relevant processed entries to the buffer of the corresponding FAS in chronological order.
  • the FLR performs a rollback operation of the corresponding modified data according to the log record, and the step of completing the repair of the log file system includes:
  • the FLR When the abnormal restart of the FAS is detected, the FLR returns the modified data of the log record from the current time point of the log record according to the log record, and the modified data of the set time length corresponds to the All changes to the FAS record;
  • the FLR rolls back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and completes the repair of the log file system.
  • the step of the FLR monitoring the FAS abnormality includes:
  • the step of sending a metadata modification change request to the FLR includes:
  • the FAC After receiving the file data push completion message returned by the FAS, the FAC fills in the corresponding metadata modification change request into the modify to be notified buffer;
  • the embodiment of the invention further provides a distributed file system, including: FAC, FAS and FLR, wherein:
  • the FAC is set to: obtain file data, and push it to the FAS;
  • the FAS is configured to: record file data pushed by the FAC, record the modification of the corresponding metadata on the FAS in the buffer, write the log file, and return a file data push completion message to the FAC;
  • the FAC is further configured to: after receiving the file data push completion message returned by the FAS, send a metadata modification change request to the FLR;
  • the FLR is configured to: modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
  • the FLR is further configured to: when the abnormal restart of the FAS is detected, perform a rollback operation of the corresponding modified data according to the log record, and complete the repair of the log file system.
  • the FLR is further configured to: add the related processed entries to the buffer of the corresponding FAS in time sequence.
  • the FLR is set to: when the abnormal restart of the FAS is detected, the modified data of the log record is retracted from the current time point of the log record for a set time length, where the set time length is The modified data corresponds to all modified records of the FAS;
  • the FAS is configured to: when the FAS is powered on, send a rollback request to the FLR to roll back the corresponding data;
  • the FLR is configured to: roll back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and complete the repair of the log file system.
  • the FLR is configured to: receive a heartbeat message periodically sent by the FAS, and determine that the FAS is abnormal when a continuous lost heartbeat message is detected.
  • the FAC is configured to: after receiving the file data push completion message returned by the FAS, fill the corresponding metadata modification change request into the modify to be notified buffer; when the set timing time arrives Sends all metadata modification change requests in the modify notification buffer to the FLR.
  • a computer readable storage medium storing computer executable instructions, the computer being executable Line instructions are used to perform the above methods.
  • the FAC obtains the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, and records the corresponding FAS in the buffer. Modifying the metadata, writing the log file, and returning the file data push completion message to the FAC; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR; Describe the metadata modification change request, modify the corresponding metadata, and record to the log file system; when the FAS abnormal restart is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system. , to ensure the final high consistency of the file after the reset file system reset and restart, to avoid the data inconsistency between multiple copies caused by the machine downtime, and to minimize the delay caused by the addition of the log system. And performance loss.
  • FIG. 1 is a schematic flow chart of an embodiment of a data processing method of a distributed file system according to the present invention
  • FIG. 2 is a schematic diagram of an interaction process between a FAC, a FAS, and an FLR according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of the interaction between the FAC and the FAS and the FAS flashing and writing in the embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a process for a FAC to send a metadata modification change request to an FLR according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a processing flow of an FLR according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a distributed file system according to the present invention.
  • the solution of the embodiment of the present invention includes: the FAC acquires the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and writes to the log file.
  • the FAC returns a file data push completion message; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change to the FLR.
  • the FLR modifies the change request according to the metadata, modifies the corresponding metadata, and records the data to the log file system; when the FAS is abnormally restarted, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes
  • the repair of the log file system ensures the final high consistency of the files after the reset and reset of the distributed file system, avoiding the inconsistency of data between multiple copies caused by the machine restart, and minimizing the addition of the log system. Brings corresponding delays and performance losses.
  • the system operating environment involved in the method embodiment of the present invention includes: FAC, FAS, and FLR, wherein:
  • FAC File Access Client, also known as File Service Client
  • FAS File Access Server, also known as File Data Server
  • FLR File Location Register
  • the solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
  • the ext2 file system (Second extended file system) is a general file system. It does not have the function of the log file system. It is reset and powered off. It is very likely that some data being written or modified will be lost in the process, resulting in inconsistency between metadata and data. For this problem, the ext3 file system (Third extended file system) has been improved, adding the function of the log system, and correcting the file system by replaying the log part during power-on. To be sexual.
  • the double-layer metadata involved in this embodiment means that there are components corresponding to metadata on both FLR and FAS, and FLR corresponds to file segment data location name information, and the FAS stores the slice name and the actual disk block. Corresponding information.
  • a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
  • the role of the FAC is to send related metadata modification request, which can itself rely on the related functions of the original distributed file system.
  • FAS itself is a function built on the lower metadata of the double-layer metadata class. Through this part, it is guaranteed that on the FAS, a valid metadata modification log portion can be constructed to ensure the consistency of the FAS side.
  • the FLR is built on the upper layer metadata of the double-layer metadata, mainly ensuring the log replay rollback problem after the modification of the upper metadata layer.
  • an embodiment of the present invention provides a data processing method for a distributed file system, including:
  • Step S101 the FAC obtains file data and pushes it to the FAS;
  • Step S102 the FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and returns a file data push completion message to the FAC;
  • the FAS periodically writes the modified buffer brush to the normal log file before the data.
  • the modified metadata is successfully written into the buffer and periodically written into the log file.
  • the interaction between the FAC and the FAS and the FAS flash write timing can be as shown in FIG.
  • the FAC sends data a to the FAS.
  • the FAS inserts the notification of the modified data a into the modification buffer.
  • the FAS writes the data a to the data buffer.
  • the FAS returns to the FAC to inform the FAC that a has successfully written the data. (After this time, the metadata modification notification is sent to the FLR)
  • the FAC sends the data b to the FAS.
  • the FAS inserts the notification of the modified data b into the modification buffer.
  • the FAS writes the data b to the data buffer.
  • Steps 5-8 represent different data, here is the speed of asynchronous notification
  • the timer log task is written, and the modification notices of a and b are written to the disk.
  • the data of b is written to disk.
  • the timer log task is written, and the write completion notifications of a and b are written to the disk.
  • Step S103 after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR;
  • the FAC After receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR, and the relevant data of the log file system is attached to the metadata modification change request.
  • the FAC After receiving the file data push completion message returned by the FAS, the FAC fills in the corresponding metadata modification change request into the modify to be notified buffer.
  • the FAC sends the metadata modification change request of the data a, the metadata modification change request of the data b, the metadata modification change request of the data c, and the metadata modification change request of the data d to the FLR as an example, and the FAC sends the metadata modification to the FLR.
  • the processing flow of the change request can be as shown in FIG. 4.
  • the metadata synchronization message is triggered to be sent to the FLR, and the timer is reset.
  • the timer After a period of time, the timer is triggered, the message in the buffer to be notified is notified to the FLR and the timer is reset. In this way, the number of FLR master messages can be greatly reduced, and the real-time performance can be maintained as much as possible in a short time interval.
  • Step S104 the FLR modifies the change request according to the metadata, and modifies the corresponding metadata, and records the file to the log file system;
  • the FLR modifies the corresponding metadata and modifies the relevant metadata to the log system by attaching the log related data record.
  • the FAS flashes the data into the disk and writes the log after determining that the write is successful.
  • the FLR adds the relevant processed entries to the buffer of the corresponding FAS in chronological order.
  • Step S105 When it is detected that the FAS is abnormally restarted, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • the FLR monitors whether the FAS is abnormal by receiving a heartbeat message periodically sent by the FAS.
  • the FAS periodically sends a still alive message to indicate that the FAS is still working.
  • the FAS When the heartbeat message from the FAS is detected, it is determined that the FAS is normal, and when the heartbeat message is continuously lost multiple times, the FAS is determined to be abnormal.
  • the FLR does not process, but if a continuous loss of heartbeat message occurs, the FLR needs to lag the FAS of the lost heartbeat message to ensure that if it is a real FAS downtime reset , will do the relevant operation back scrolling.
  • the FLR performs a rollback operation according to the log record, that is, the modified data of the log record is forwarded back to a specific length of time from the current time point.
  • the modified data of the fixed time length corresponds to all the modified records of the FAS, that is, the data modification changes reported by the FAC.
  • a rollback request is sent to the FLR to roll back the corresponding data; the FLR rolls back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and completes the repair of the log file system.
  • the processing flow of the FLR in this embodiment can be as shown in FIG. 5.
  • the log system When one of the FASs is abnormally restarted, the log system enters the repair process. The process is first triggered on the FLR. When the FLR confirms that an FAS is restarted, the log system will roll back all the modification records corresponding to the FAS for a certain length of time through the log records on the FLR. At the same time, when the FAS is powered on, the logs recorded by the FAS are used to roll back related data written to the FAS but not written to the disk, and a rollback request is sent to the FLR to roll back the corresponding data.
  • the system of this embodiment can provide all the characteristics of the lagging log file system without reducing the response of the file system, and ensure high consistency of the file after the system resets and restarts.
  • the FAC obtains the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, and records the modification of the corresponding metadata on the FAS in the buffer, and writes the log file. And returning a file data push completion message to the FAC; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR; the FLR modifies the change request according to the metadata, and modifies the corresponding element.
  • the data is recorded to the log file system; when the FAS abnormal restart is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures that the distributed file system is reset and restarted.
  • the ultimate high consistency avoids the inconsistency of data between multiple copies caused by machine downtime, while minimizing the corresponding delay and performance loss due to the addition of the log system.
  • the log system of this embodiment has no sensitivity and correlation to the scale of the distributed system, and the system pressure is constant, and the pressure of the log system is not increased due to the expansion of the cluster.
  • the pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
  • an embodiment of the present invention provides a distributed file system, including: FAC201, FAS 202, and FLR 203, where:
  • the FAC 201 is configured to: obtain file data, and send it to the FAS 202;
  • the FAS 202 is configured to: record the file data pushed by the FAC 201, record the modification of the corresponding metadata on the FAS 202 in the buffer, write the log file, and return the file data push completion message to the FAC 201. ;
  • the FAC 201 is further configured to: after receiving the file data push completion message returned by the FAS 202, send a metadata modification change request to the FLR 203;
  • the FLR 203 is configured to: modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
  • the FLR 203 is further configured to: when it is detected that the FAS 202 is abnormally restarted, perform a rollback operation of the corresponding modified data according to the log record, and complete the repair of the log file system.
  • the file service client is configured to provide a connection between the user and the internal data of the distributed file system.
  • the file data server is set to: store the actual data of the file.
  • FLR 203 The file location register is set to: store metadata related to data and data.
  • the solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
  • the ext2 file system is a general file system. It does not have the function of the log file system. It is likely to lose some of the write or modify during the reset and power off process. Data, resulting in inconsistency between metadata and data. To solve this problem, the ext3 file system has been improved, and the function of the log system has been added. When the power is turned on, the consistency of the file system is corrected by replaying the log portion.
  • the double-layer metadata involved in this embodiment means that there are components corresponding to the metadata on the FLR 203 and the FAS 202, and the FLR 203 corresponds to the file segment data location name information, and the FAS 202 stores the slice name and Corresponding information of the actual disk block.
  • a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
  • the role of the FAC 201 is to send related metadata modification request, and the related function of the original distributed file system can be utilized by itself.
  • FAS 202 itself is a function built on the lower layer metadata of the double-layer metadata class. Through this part, it is guaranteed that on the FAS 202, a valid metadata modification log portion can be constructed to ensure the consistency of the FAS 202 side.
  • the FLR 203 is built on the upper layer metadata of the double layer metadata, mainly ensuring the log replay rollback problem after the modification of the upper layer metadata layer.
  • the interaction process between the FAC 201, the FAS 202, and the FLR 203 in the system can be as shown in FIG. 2.
  • the FAC 201 acquires the file data and pushes it to the FAS 202 for storing the data.
  • the FAS 202 records the file data pushed by the FAC 201, records the modification of the metadata on the FAS 202 in the buffer, and returns a file data push completion message to the FAC 201.
  • the FAS 202 periodically writes the modified buffer to the normal log file prior to the data.
  • the FAS 202 flashes the data to the disk, the metadata modification succeeded in the flashing is completed and put into the buffer, and the brush is periodically written into the log file.
  • the interaction between the FAC 201 and the FAS 202 and the FAS 202 flash write timing can be as shown in FIG. 3.
  • the FAC 201 transmits data a to the FAS 202.
  • the FAS 202 inserts a notification to modify the data a into the modification buffer.
  • the FAS 202 writes the data a to the data buffer.
  • the FAS 202 returns to the FAC 201, notifying the FAC 201 that a has successfully written the data. (After this time, the metadata modification notification is sent to FLR 203)
  • the FAC 201 sends the data b to the FAS 202.
  • the FAS 202 inserts the notification of the modified data b into the modification buffer.
  • the FAS 202 writes the data b to the data buffer.
  • the FAS 202 returns to the FAC 201, notifying the FAC 201 that b has successfully written the data. (Steps 5-8 represent different data, here is the speed of asynchronous notification)
  • the timer log task is written, and the modification notices of a and b are written to the disk.
  • the data of b is written to disk.
  • the timer log task is written, and the write completion notifications of a and b are written to the disk.
  • the FAC 201 Upon receiving the file data push completion message returned by the FAS 202, the FAC 201 transmits a metadata modification change request to the FLR 203, and the relevant data of the log file system is attached to the metadata modification change request.
  • the FAC 201 when the FAC 201 sends a metadata modification change request to the FLR 203, the following scheme may be adopted:
  • the FAC 201 After receiving the file data push completion message returned by the FAS 202, the FAC 201 fills in the corresponding metadata modification change request into the modify to be notified buffer.
  • the FAC 201 sends the metadata modification change request of the data a to the FLR 203, the metadata modification change request of the data b, the metadata modification change request of the data c, and the metadata modification change request of the data d as an example, and the FAC 201 to the FLR 203
  • the processing flow for sending the metadata modification change request can be as shown in FIG.
  • the metadata synchronization message is triggered to be sent to the FLR 203, and the timer is reset.
  • the timer is triggered, and the message in the buffer to be notified is notified to the FLR 203 and the timer is reset.
  • This kind of processing can greatly reduce the number of master messages for the FLR 203, and at the same time keep the real-time performance as much as possible in a short time interval.
  • the FLR 203 modifies the corresponding metadata, and modifies the relevant metadata into the log system by attaching the log related data record.
  • the FAS 202 flashes the data into the disk and writes the log after determining that the write is successful.
  • the FLR 203 adds the relevant processed entries to the buffer of the corresponding FAS 202 in chronological order.
  • the FLR 203 When it is detected that the FAS 202 is abnormally restarted, the FLR 203 performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • the FLR 203 monitors whether the FAS 202 is abnormal by receiving a heartbeat message periodically sent by the FAS 202.
  • the FAS 202 periodically sends a still alive message to indicate that the FAS 202 is still working.
  • the FAS 202 When the heartbeat message from the FAS 202 is detected, it is determined that the FAS 202 is normal, and when the consecutively lost heartbeat messages are detected, it is determined that the FAS 202 is abnormal.
  • FLR 203 does not process, but if some In the case of continuous loss of heartbeat messages, the FLR 203 needs to lag the FAS 202 of the lost heartbeat message to ensure that if the real FAS 202 is down, it will do the scrolling of the related operations.
  • the FLR 203 When it is detected that the FAS 202 is abnormally restarted, the FLR 203 performs a rollback operation according to the log record, that is, the modified data of the log record is retracted from the current time point of the log record for a set time length, the set time
  • the modified data of the length corresponds to all the modified records of the FAS 202, that is, the data modification changes reported by the FAC.
  • a rollback request is sent to the FLR 203 to roll back the corresponding data; the FLR 203 rolls back the corresponding data to the buffer of the corresponding FAS 202 according to the rollback request, and completes the log file system. repair.
  • the processing flow of the FLR 203 in this embodiment can be as shown in FIG. 5.
  • the log system When one of the FAS 202s is abnormally restarted, the log system enters the repair process. The flow is first triggered on the FLR 203. When the FLR 203 confirms that a FAS 202 has been restarted, the log system will roll back through the log records on the FLR 203 all the modification records corresponding to the FAS 202 for a specific length of time. At the same time, when the FAS 202 is powered on, through the log recorded locally by the FAS 202, the related data written to the FAS 202 but not written to the disk is rolled back, and a rollback request is sent to the FLR 203 to roll back the corresponding data.
  • the system of this embodiment can provide all the characteristics of the lagging log file system without reducing the response of the file system, and ensure high consistency of the file after the system resets and restarts.
  • the FAC 201 acquires the file data and sends it to the FAS 202; the FAS 202 records the file data pushed by the FAC 201, and records the modification and writing of the corresponding metadata on the FAS 202 in the buffer.
  • the FAC 201 sends a metadata modification change request to the FLR 203; the FLR 203 is based on the metadata Modify the change request, modify the corresponding metadata, and record to the log file system; when the FAS is detected When the abnormal restart is performed, the FLR 203 performs the rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures the final high consistency of the file after the reset file system is reset and restarted, thereby avoiding the machine restarting.
  • the inconsistency of data between multiple copies while minimizing the corresponding delay and performance loss due to the addition of the log system.
  • the log system has no sensitivity and correlation to the scale of the distributed system, and the system pressure is constant, and the pressure of the log system is not increased due to the expansion of the cluster. Has good convergence and no overhead on the network.
  • the pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the invention ensures the final high consistency of the file after the reset file system is reset and restarted, avoids the data inconsistency between multiple copies caused by the machine restart, and minimizes the corresponding increase of the log system.
  • the delay and performance loss ensures the final high consistency of the file after the reset file system is reset and restarted, avoids the data inconsistency between multiple copies caused by the machine restart, and minimizes the corresponding increase of the log system. The delay and performance loss.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A data processing method for a distributed file system and a distributed file system. The method comprises: an FAC acquires file data and pushes the file data to an FAS (101); the FAS records the file data pushed by the FAC, records modification of corresponding metadata in the FAS this time in a buffer area, writes the modification into a log file, and returns a file data pushing completion message to the FAC (102); the FAC sends a metadata modification change request to an FLR (103); the FLR modifies the corresponding metadata according to the metadata modification change request, and records the modification in a log file system (104); and when unexpected restart of the FAS is detected, the FLR implements rollback operation of the corresponding modified data according to the log records (105).

Description

分布式文件系统的数据处理方法及分布式文件系统Data processing method of distributed file system and distributed file system 技术领域Technical field
本文涉及分布式文件存储技术领域,尤其涉及一种分布式文件系统的数据处理方法及分布式文件系统。This paper relates to the field of distributed file storage technology, and in particular relates to a data processing method and a distributed file system of a distributed file system.
背景技术Background technique
随着多媒体产业的迅猛发展,出于成本、可靠性等多方面的考虑,越来越多的厂商选择在产品中部署自研的分布式上层存储系统,分布式文件系统也因此得到了快速的发展。分布式文件系统可以提供高的吞吐率,可以提供普通本地文件系统几倍以上的吞吐率,同时可以提供高可靠性,通过多副本、冗余副本技术,提高单机异常时数据的可靠性,同时对于磁阵这样的设备,具有价格便宜、设备通用的优点。With the rapid development of the multimedia industry, due to cost, reliability and other considerations, more and more manufacturers choose to deploy self-developed distributed upper-layer storage systems in their products, and the distributed file system has also been quickly development of. The distributed file system can provide high throughput rate, can provide several times the throughput rate of the common local file system, and provide high reliability. Through multiple copies and redundant copy technology, the reliability of data in the case of abnormal single machine can be improved. For devices such as magnetic arrays, there are advantages of being inexpensive and versatile.
目前,在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。而对于分布式系统中的大量机器,宕机重启已经是一个常态的问题,如何保证在服务器宕机重启后,保证文件多个副本内数据的一致性,将十分的必要。Currently, in most distributed file systems, some focus on throughput performance, but reduce the guarantee of file system consistency. The other part, while ensuring the consistency of synchronization, greatly reduces the performance of writing and modification. For a large number of machines in a distributed system, downtime restart is already a normal problem. How to ensure the consistency of data in multiple copies of a file after the server is restarted is very necessary.
发明内容Summary of the invention
本文提供一种分布式文件系统的数据处理方法及分布式文件系统,避免FAS宕机重启所带来的多副本间数据的不一致性。This paper provides a data processing method and distributed file system for distributed file system to avoid data inconsistency between multiple copies caused by FAS downtime.
一种分布式文件系统的数据处理方法,包括:A data processing method for a distributed file system, comprising:
FAC获取文件数据,推送给FAS;The FAC obtains the file data and pushes it to the FAS;
所述FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;The FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and returns a file data push completion message to the FAC;
所述FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发 送元数据修改变化请求;After receiving the file data push completion message returned by the FAS, the FAC sends a message to the FLR. Send metadata modification request;
所述FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;The FLR modifies the change request according to the metadata, modifies the corresponding metadata, and records the file to the log file system;
当监测到所述FAS异常重启时,所述FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。When the abnormal restart of the FAS is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
可选地,所述FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统的步骤中还包括:Optionally, the step of the FLR modifying the change request according to the metadata, modifying the corresponding metadata, and recording to the log file system further includes:
所述FLR按照时间的顺序,将相关处理的条目加入对应的FAS的缓冲区。The FLR adds the relevant processed entries to the buffer of the corresponding FAS in chronological order.
可选地,所述当监测到FAS异常重启时,所述FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复的步骤包括:Optionally, when the abnormal restart of the FAS is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and the step of completing the repair of the log file system includes:
当监测到所述FAS异常重启时,所述FLR根据日志记录,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述FAS的所有修改记录;When the abnormal restart of the FAS is detected, the FLR returns the modified data of the log record from the current time point of the log record according to the log record, and the modified data of the set time length corresponds to the All changes to the FAS record;
当所述FAS上电时,发送回滚请求到FLR以回滚相应的数据;Sending a rollback request to the FLR to roll back the corresponding data when the FAS is powered on;
所述FLR根据所述回滚请求回滚相应的数据至对应的FAS的缓冲区,完成日志文件系统的修复。The FLR rolls back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and completes the repair of the log file system.
可选地,所述FLR监测FAS异常的步骤包括:Optionally, the step of the FLR monitoring the FAS abnormality includes:
所述FLR接收所述FAS定期发送的心跳报文;Receiving, by the FLR, a heartbeat message periodically sent by the FAS;
当监测到连续多次丢失心跳报文时,判定所述FAS异常。When it is detected that the heartbeat message is lost multiple times in succession, it is determined that the FAS is abnormal.
可选地,所述FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求的步骤包括:Optionally, after the FAC receives the file data push completion message returned by the FAS, the step of sending a metadata modification change request to the FLR includes:
所述FAC接收到所述FAS返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;After receiving the file data push completion message returned by the FAS, the FAC fills in the corresponding metadata modification change request into the modify to be notified buffer;
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至FLR。 When the set timing time arrives, all metadata modification change requests in the modify notification buffer are sent to the FLR.
本发明实施例还提出一种分布式文件系统,包括:FAC、FAS及FLR,其中:The embodiment of the invention further provides a distributed file system, including: FAC, FAS and FLR, wherein:
所述FAC,设置为:获取文件数据,推送给FAS;The FAC is set to: obtain file data, and push it to the FAS;
所述FAS,设置为:记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;The FAS is configured to: record file data pushed by the FAC, record the modification of the corresponding metadata on the FAS in the buffer, write the log file, and return a file data push completion message to the FAC;
所述FAC,还设置为:接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求;The FAC is further configured to: after receiving the file data push completion message returned by the FAS, send a metadata modification change request to the FLR;
所述FLR,设置为:根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;The FLR is configured to: modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
所述FLR,还设置为:当监测到所述FAS异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。The FLR is further configured to: when the abnormal restart of the FAS is detected, perform a rollback operation of the corresponding modified data according to the log record, and complete the repair of the log file system.
可选地,所述FLR,还设置为:按照时间的顺序,将相关处理的条目加入对应的FAS的缓冲区。Optionally, the FLR is further configured to: add the related processed entries to the buffer of the corresponding FAS in time sequence.
可选地,所述FLR,是设置为:当监测到所述FAS异常重启时,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述FAS的所有修改记录;Optionally, the FLR is set to: when the abnormal restart of the FAS is detected, the modified data of the log record is retracted from the current time point of the log record for a set time length, where the set time length is The modified data corresponds to all modified records of the FAS;
所述FAS,是设置为:当所述FAS上电时,发送回滚请求到FLR以回滚相应的数据;The FAS is configured to: when the FAS is powered on, send a rollback request to the FLR to roll back the corresponding data;
所述FLR,是设置为:根据所述回滚请求回滚相应的数据至对应的FAS的缓冲区,完成日志文件系统的修复。The FLR is configured to: roll back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and complete the repair of the log file system.
可选地,所述FLR,是设置为:接收所述FAS定期发送的心跳报文;当监测到连续多次丢失心跳报文时,判定所述FAS异常。Optionally, the FLR is configured to: receive a heartbeat message periodically sent by the FAS, and determine that the FAS is abnormal when a continuous lost heartbeat message is detected.
可选地,所述FAC,是设置为:接收到所述FAS返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至FLR。Optionally, the FAC is configured to: after receiving the file data push completion message returned by the FAS, fill the corresponding metadata modification change request into the modify to be notified buffer; when the set timing time arrives Sends all metadata modification change requests in the modify notification buffer to the FLR.
一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执 行指令用于执行上述的方法。A computer readable storage medium storing computer executable instructions, the computer being executable Line instructions are used to perform the above methods.
本发明实施例提出的一种分布式文件系统的数据处理方法及分布式文件系统,FAC获取文件数据,推送给FAS;FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求;FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述FAS异常重启时,FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。The data processing method and the distributed file system of the distributed file system proposed by the embodiment of the present invention, the FAC obtains the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, and records the corresponding FAS in the buffer. Modifying the metadata, writing the log file, and returning the file data push completion message to the FAC; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR; Describe the metadata modification change request, modify the corresponding metadata, and record to the log file system; when the FAS abnormal restart is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system. , to ensure the final high consistency of the file after the reset file system reset and restart, to avoid the data inconsistency between multiple copies caused by the machine downtime, and to minimize the delay caused by the addition of the log system. And performance loss.
附图概述BRIEF abstract
图1是本发明分布式文件系统的数据处理方法一实施例的流程示意图;1 is a schematic flow chart of an embodiment of a data processing method of a distributed file system according to the present invention;
图2是本发明实施例FAC、FAS及FLR之间的交互流程示意图;2 is a schematic diagram of an interaction process between a FAC, a FAS, and an FLR according to an embodiment of the present invention;
图3是本发明实施例FAC与FAS之间交互以及FAS刷写时序示意图;3 is a schematic diagram of the interaction between the FAC and the FAS and the FAS flashing and writing in the embodiment of the present invention;
图4是本发明实施例FAC向FLR发送元数据修改变化请求的处理流程示意图;4 is a schematic flowchart of a process for a FAC to send a metadata modification change request to an FLR according to an embodiment of the present invention;
图5是本发明实施例FLR的处理流程示意图;FIG. 5 is a schematic diagram of a processing flow of an FLR according to an embodiment of the present invention; FIG.
图6是本发明分布式文件系统一实施例架构示意图。FIG. 6 is a schematic structural diagram of an embodiment of a distributed file system according to the present invention.
下面结合附图对本发明的实施方式做详细说明。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
本发明的实施方式Embodiments of the invention
本发明实施例的解决方案包括:FAC获取文件数据,推送给FAS;FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请 求;FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述FAS异常重启时,FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。The solution of the embodiment of the present invention includes: the FAC acquires the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and writes to the log file. The FAC returns a file data push completion message; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change to the FLR. The FLR modifies the change request according to the metadata, modifies the corresponding metadata, and records the data to the log file system; when the FAS is abnormally restarted, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes The repair of the log file system ensures the final high consistency of the files after the reset and reset of the distributed file system, avoiding the inconsistency of data between multiple copies caused by the machine restart, and minimizing the addition of the log system. Brings corresponding delays and performance losses.
本发明方法实施例涉及的系统运行环境包括:FAC、FAS及FLR,其中:The system operating environment involved in the method embodiment of the present invention includes: FAC, FAS, and FLR, wherein:
FAC(File Access Client,文件访问客户端,也称为文件服务客户端):设置为:提供用户与分布式文件系统内部数据的衔接。FAC (File Access Client, also known as File Service Client): Set to: Provide the user with the internal data of the distributed file system.
FAS(File Access Server,文件访问服务器,也称为文件数据服务器):设置为:存放文件实际的数据。FAS (File Access Server, also known as File Data Server): Set to: store the actual data of the file.
FLR(File Location Register,文件位置寄存器):设置为:存放文件与数据对应的元数据等的相关信息。FLR (File Location Register): Set to: store information related to metadata such as files and data.
由于目前在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证,并没有提供类似于本地文件系统日志文件系统的保障。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。相关方案在服务器宕机重启后,无法保证文件多个副本内数据的一致性。Since most of the distributed file systems currently focus on throughput performance, they reduce the guarantee of file system consistency and do not provide a guarantee similar to the local file system log file system. The other part, while ensuring the consistency of synchronization, greatly reduces the performance of writing and modification. The related solution cannot guarantee the consistency of data in multiple copies of the file after the server is restarted.
本实施例方案提出一种针对双层元数据情况下的,滞后形的日志文件系统方式,可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。The solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
关于日志文件的作用:以本地文件系统为例,ext2文件系统(Second extended filesystem,第二代扩展文件系统)是一个通用的文件系统,本身不带有日志文件系统的功能,在复位、断电过程中很可能会丢失正在写或修改的一些数据,而造成元数据与数据的不一致性。而针对这一问题,ext3文件系统(Third extended filesystem,第三代扩展文件系统)进行了改进,添加了日志系统的功能,在上电的时候通过对日志部分的重放,修正文件系统的一 致性。About the role of the log file: Take the local file system as an example. The ext2 file system (Second extended file system) is a general file system. It does not have the function of the log file system. It is reset and powered off. It is very likely that some data being written or modified will be lost in the process, resulting in inconsistency between metadata and data. For this problem, the ext3 file system (Third extended file system) has been improved, adding the function of the log system, and correcting the file system by replaying the log part during power-on. To be sexual.
本实施例所涉及的双层元数据是指:在FLR和FAS上都有对应元数据的成分,FLR上对应的是文件分片数据位置名称信息,FAS上存放着分片名称与实际磁盘块的对应信息。通俗的讲,构建在本地文件系统之上的含有管理元数据的分布式文件系统,都属于这种双层元数据分布式文件系统范畴。The double-layer metadata involved in this embodiment means that there are components corresponding to metadata on both FLR and FAS, and FLR corresponds to file segment data location name information, and the FAS stores the slice name and the actual disk block. Corresponding information. In layman's terms, a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
本实施例方案中,FAC的作用为发送相关元数据修改变化请求,本身可以借助原有分布式文件系统的相关功能。In the solution of this embodiment, the role of the FAC is to send related metadata modification request, which can itself rely on the related functions of the original distributed file system.
FAS本身是一个构建于双层元数据类下层元数据上的功能,通过这个部分,保证在FAS上,可以构建一个有效的元数据修改记录的日志部分,保证FAS侧的一致性。FAS itself is a function built on the lower metadata of the double-layer metadata class. Through this part, it is guaranteed that on the FAS, a valid metadata modification log portion can be constructed to ensure the consistency of the FAS side.
FLR构建在双层元数据的上层元数据上,主要保证关于上层元数据层修改之后的日志重放回滚问题。The FLR is built on the upper layer metadata of the double-layer metadata, mainly ensuring the log replay rollback problem after the modification of the upper metadata layer.
系统中FAC、FAS及FLR之间的交互流程可以如图1和2所示。The interaction process between FAC, FAS and FLR in the system can be as shown in Figures 1 and 2.
如图1所示,本发明一实施例提出一种分布式文件系统的数据处理方法,包括:As shown in FIG. 1, an embodiment of the present invention provides a data processing method for a distributed file system, including:
步骤S101,FAC获取文件数据,推送给FAS;Step S101, the FAC obtains file data and pushes it to the FAS;
步骤S102,所述FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;Step S102, the FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and returns a file data push completion message to the FAC;
此外,FAS定期的先于数据将修改的缓冲区刷写入正常的日志文件中。In addition, the FAS periodically writes the modified buffer brush to the normal log file before the data.
FAS刷写数据到磁盘之后,将刷写成功的元数据修改完成放入缓冲区,定期刷写入日志文件中。After the FAS flashes the data to the disk, the modified metadata is successfully written into the buffer and periodically written into the log file.
其中,FAC与FAS之间交互以及FAS刷写时序可以如图3所示。Among them, the interaction between the FAC and the FAS and the FAS flash write timing can be as shown in FIG.
以FAC发送数据a和数据b到FAS为例,处理流程如下:Take the FAC as the example of sending data a and data b to FAS. The processing flow is as follows:
1、FAC发送数据a到FAS。1. The FAC sends data a to the FAS.
2、FAS将修改数据a的通知插入修改缓冲区。2. The FAS inserts the notification of the modified data a into the modification buffer.
3、FAS将数据a写入数据缓冲区。 3. The FAS writes the data a to the data buffer.
4、FAS返回给FAC,通知FAC,a已经写数据成功。(此时之后就开启了向FLR发送元数据修改通知)4. The FAS returns to the FAC to inform the FAC that a has successfully written the data. (After this time, the metadata modification notification is sent to the FLR)
5、FAC发送数据b到FAS。5. The FAC sends the data b to the FAS.
6、FAS将修改数据b的通知插入修改缓冲区。6. The FAS inserts the notification of the modified data b into the modification buffer.
7、FAS将数据b写入数据缓冲区。7. The FAS writes the data b to the data buffer.
8、FAS返回给FAC,通知FAC,b已经写数据成功。(步骤5~8代表不同的数据,这里体现出异步通知的速度)8. The FAS returns to the FAC, notifying the FAC that b has successfully written the data. (Steps 5-8 represent different data, here is the speed of asynchronous notification)
9、定时日志任务刷写,a和b的修改通知被写入磁盘。9. The timer log task is written, and the modification notices of a and b are written to the disk.
10、a的数据被写入磁盘。10. The data of a is written to disk.
11、a数据写入磁盘的完成通知插入修改缓冲区。11, a data is written to the disk completion notification to insert the modified buffer.
12、b的数据被写入磁盘。12. The data of b is written to disk.
13、b数据写入磁盘的完成通知插入修改缓冲区。13. The completion of the b data write to the disk inserts the modification buffer.
14、定时日志任务刷写,a和b的写入磁盘完成通知,被写入磁盘。14. The timer log task is written, and the write completion notifications of a and b are written to the disk.
此时完整的日志流程被写入,此时FAS侧日志系统被完整写入。At this point, the complete log flow is written, and the FAS side log system is completely written.
步骤S103,所述FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求;Step S103, after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR;
FAC接收到FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求,在元数据修改变化请求中附带上日志文件系统的相关数据。After receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR, and the relevant data of the log file system is attached to the metadata modification change request.
作为一种可选实施方式,FAC在向FLR发送元数据修改变化请求时,可以采用如下方案:As an optional implementation manner, when the FAC sends a metadata modification change request to the FLR, the following scheme may be adopted:
FAC接收到所述FAS返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区。After receiving the file data push completion message returned by the FAS, the FAC fills in the corresponding metadata modification change request into the modify to be notified buffer.
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至FLR。When the set timing time arrives, all metadata modification change requests in the modify notification buffer are sent to the FLR.
以FAC向FLR发送数据a的元数据修改变化请求、数据b的元数据修改变化请求、数据c的元数据修改变化请求、数据d的元数据修改变化请求为例,FAC向FLR发送元数据修改变化请求的处理流程可以如图4所示。 The FAC sends the metadata modification change request of the data a, the metadata modification change request of the data b, the metadata modification change request of the data c, and the metadata modification change request of the data d to the FLR as an example, and the FAC sends the metadata modification to the FLR. The processing flow of the change request can be as shown in FIG. 4.
1、FAC写x文件后将a的修改填入修改待通知缓冲区;1. After the FAC writes the x file, the modification of a is filled in the modified to be notified buffer;
2、FAC写x文件后将b的修改填入修改待通知缓冲区;2. After the FAC writes the x file, the modification of b is filled in the modified to be notified buffer;
3、FAC写x文件后将c的修改填入修改待通知缓冲区;3. After the FAC writes the x file, fill in the modification of c into the buffer to be notified;
4、FAC写y文件后将d的修改填入修改待通知缓冲区。4. After the FAC writes the y file, the modification of d is filled in the buffer to be notified.
此时是检测时间已经达到要求的时间区间,同时定时器还没有触发,则触发发送元数据同步消息给FLR,同时重新设置定时器。At this time, when the detection time has reached the required time interval, and the timer has not been triggered, the metadata synchronization message is triggered to be sent to the FLR, and the timer is reset.
当一段时间后,定时器触发,将待通知缓冲区内的消息,通知到FLR并重行设置定时器。此种处理方式,可以大大的减轻对于FLR主控消息的数量,同时在短小的时间间隔内又可以尽可能的保持实时性。After a period of time, the timer is triggered, the message in the buffer to be notified is notified to the FLR and the timer is reset. In this way, the number of FLR master messages can be greatly reduced, and the real-time performance can be maintained as much as possible in a short time interval.
步骤S104,所述FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;Step S104, the FLR modifies the change request according to the metadata, and modifies the corresponding metadata, and records the file to the log file system;
FLR收到元数据修改变化后修改相应元数据,并通过附加日志相关数据记录,将相关元数据修改到日志系统中。与此同时,FAS刷写数据入磁盘,在确定写入成功后刷写日志。After receiving the metadata modification, the FLR modifies the corresponding metadata and modifies the relevant metadata to the log system by attaching the log related data record. At the same time, the FAS flashes the data into the disk and writes the log after determining that the write is successful.
另外,FLR按照时间的顺序,将相关处理的条目加入对应的FAS的缓冲区。In addition, the FLR adds the relevant processed entries to the buffer of the corresponding FAS in chronological order.
步骤S105,当监测到所述FAS异常重启时,所述FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。Step S105: When it is detected that the FAS is abnormally restarted, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
FLR通过接收FAS定期发送的心跳报文来监测FAS是否异常。The FLR monitors whether the FAS is abnormal by receiving a heartbeat message periodically sent by the FAS.
FAS定期发送心跳(still alive)消息,以表明FAS依然在工作。The FAS periodically sends a still alive message to indicate that the FAS is still working.
当监测到来自FAS的心跳报文时,判定所述FAS正常,当监测到连续多次丢失心跳报文时,判定所述FAS异常。When the heartbeat message from the FAS is detected, it is determined that the FAS is normal, and when the heartbeat message is continuously lost multiple times, the FAS is determined to be abnormal.
对于FAS发送来的心跳报文,FLR不做处理,但是如果出现某个连续丢失心跳报文的情况,FLR就需要对丢失心跳报文的FAS做滞后处理,保证如果是真实的FAS宕机复位,将做相关操作的回滚动作。For heartbeat packets sent by the FAS, the FLR does not process, but if a continuous loss of heartbeat message occurs, the FLR needs to lag the FAS of the lost heartbeat message to ensure that if it is a real FAS downtime reset , will do the relevant operation back scrolling.
当监测到所述FAS异常重启时,所述FLR根据日志记录,进行回滚操作,即从当前时间点,将日志记录的修改数据向前回退特定时间长度,该特 定时间长度的修改数据对应于所述FAS的所有修改记录,即FAC上报的数据修改变化。When the abnormal restart of the FAS is detected, the FLR performs a rollback operation according to the log record, that is, the modified data of the log record is forwarded back to a specific length of time from the current time point. The modified data of the fixed time length corresponds to all the modified records of the FAS, that is, the data modification changes reported by the FAC.
当所述FAS上电时,发送回滚请求到FLR以回滚相应的数据;FLR根据所述回滚请求回滚相应的数据至对应的FAS的缓冲区,完成日志文件系统的修复。When the FAS is powered on, a rollback request is sent to the FLR to roll back the corresponding data; the FLR rolls back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and completes the repair of the log file system.
本实施例中FLR的处理流程可以如图5所示。The processing flow of the FLR in this embodiment can be as shown in FIG. 5.
当其中的一台FAS异常宕机重启的情况下,日志系统进入修复流程。流程首先于FLR上触发,当FLR确认一台FAS重启了,日志系统将通过FLR上的日志记录回滚特定时间长度对应于这台FAS的所有修改记录。同时当这台FAS上电时,通过FAS本地记录的日志,回滚那些写入FAS但是没有写入磁盘的相关数据,发送回滚请求到FLR以回滚相应的数据。When one of the FASs is abnormally restarted, the log system enters the repair process. The process is first triggered on the FLR. When the FLR confirms that an FAS is restarted, the log system will roll back all the modification records corresponding to the FAS for a certain length of time through the log records on the FLR. At the same time, when the FAS is powered on, the logs recorded by the FAS are used to roll back related data written to the FAS but not written to the disk, and a rollback request is sent to the FLR to roll back the corresponding data.
当两个流程运行完成,修复流程顺利完成,同时系统在修复流程中,通过其它副本的存在依然提供一致性的数据,达到对用户的不可见。When the two processes are completed, the repair process is successfully completed, and the system still provides consistent data through the existence of other replicas in the repair process, which is invisible to the user.
本实施例的系统可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。The system of this embodiment can provide all the characteristics of the lagging log file system without reducing the response of the file system, and ensure high consistency of the file after the system resets and restarts.
相比相关技术,本施例方案中,FAC获取文件数据,推送给FAS;FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求;FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述FAS异常重启时,FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。Compared with the related technology, in the solution of the embodiment, the FAC obtains the file data and pushes it to the FAS; the FAS records the file data pushed by the FAC, and records the modification of the corresponding metadata on the FAS in the buffer, and writes the log file. And returning a file data push completion message to the FAC; after receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the FLR; the FLR modifies the change request according to the metadata, and modifies the corresponding element. The data is recorded to the log file system; when the FAS abnormal restart is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures that the distributed file system is reset and restarted. The ultimate high consistency avoids the inconsistency of data between multiple copies caused by machine downtime, while minimizing the corresponding delay and performance loss due to the addition of the log system.
本实施例的日志系统对于分布式系统的规模没有敏感性与相关性,对系统压力是常量,不会因为集群的扩大而增大日志系统的压力。具有良好的收 敛性,同时没有网络上的额外开销。对于日志系统所在磁盘压力极小,是一种以较高错杀率为代价的高性能,低延迟的日志文件系统。The log system of this embodiment has no sensitivity and correlation to the scale of the distributed system, and the system pressure is constant, and the pressure of the log system is not increased due to the expansion of the cluster. Have a good income Convergent, without the overhead on the network. The pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
如图6所示,本发明一实施例提出一种分布式文件系统,包括:FAC201、FAS 202及FLR 203,其中:As shown in FIG. 6, an embodiment of the present invention provides a distributed file system, including: FAC201, FAS 202, and FLR 203, where:
所述FAC 201,设置为:获取文件数据,推送给FAS 202;The FAC 201 is configured to: obtain file data, and send it to the FAS 202;
所述FAS 202,设置为:记录FAC 201推送过来的文件数据,在缓冲区记录下此次FAS 202上对应的元数据的修改,写入日志文件,并向所述FAC201返回文件数据推送完成消息;The FAS 202 is configured to: record the file data pushed by the FAC 201, record the modification of the corresponding metadata on the FAS 202 in the buffer, write the log file, and return the file data push completion message to the FAC 201. ;
所述FAC 201,还设置为:接收到所述FAS 202返回的文件数据推送完成消息后,向FLR 203发送元数据修改变化请求;The FAC 201 is further configured to: after receiving the file data push completion message returned by the FAS 202, send a metadata modification change request to the FLR 203;
所述FLR 203,设置为:根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;The FLR 203 is configured to: modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
所述FLR 203,还设置为:当监测到所述FAS 202异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。The FLR 203 is further configured to: when it is detected that the FAS 202 is abnormally restarted, perform a rollback operation of the corresponding modified data according to the log record, and complete the repair of the log file system.
FAC 201:文件服务客户端,是设置为:提供用户与分布式文件系统内部数据的衔接。FAC 201: The file service client is configured to provide a connection between the user and the internal data of the distributed file system.
FAS 202:文件数据服务器,是设置为:存放文件实际的数据。FAS 202: The file data server is set to: store the actual data of the file.
FLR 203:文件位置寄存器,是设置为:存放文件与数据对应的元数据等的相关信息。FLR 203: The file location register is set to: store metadata related to data and data.
由于目前在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证,并没有提供类似于本地文件系统日志文件系统的保障。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。相关方案在服务器宕机重启后,无法保证文件多个副本内数据的一致性。Since most of the distributed file systems currently focus on throughput performance, they reduce the guarantee of file system consistency and do not provide a guarantee similar to the local file system log file system. The other part, while ensuring the consistency of synchronization, greatly reduces the performance of writing and modification. The related solution cannot guarantee the consistency of data in multiple copies of the file after the server is restarted.
本实施例方案提出一种针对双层元数据情况下的,滞后形的日志文件系统方式,可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。 The solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
关于日志文件的作用:以本地文件系统为例,ext2文件系统是一个通用的文件系统,本身不带有日志文件系统的功能,在复位、断电过程中很可能会丢失正在写或修改的一些数据,而造成元数据与数据的不一致性。而针对这一问题,ext3文件系统进行了改进,添加了日志系统的功能,在上电的时候通过对日志部分的重放,修正文件系统的一致性。About the role of the log file: Take the local file system as an example. The ext2 file system is a general file system. It does not have the function of the log file system. It is likely to lose some of the write or modify during the reset and power off process. Data, resulting in inconsistency between metadata and data. To solve this problem, the ext3 file system has been improved, and the function of the log system has been added. When the power is turned on, the consistency of the file system is corrected by replaying the log portion.
本实施例所涉及的双层元数据是指:在FLR 203和FAS 202上都有对应元数据的成分,FLR 203上对应的是文件分片数据位置名称信息,FAS202上存放着分片名称与实际磁盘块的对应信息。通俗的讲,构建在本地文件系统之上的含有管理元数据的分布式文件系统,都属于这种双层元数据分布式文件系统范畴。The double-layer metadata involved in this embodiment means that there are components corresponding to the metadata on the FLR 203 and the FAS 202, and the FLR 203 corresponds to the file segment data location name information, and the FAS 202 stores the slice name and Corresponding information of the actual disk block. In layman's terms, a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
本实施例方案中,FAC 201的作用为发送相关元数据修改变化请求,本身可以借助原有分布式文件系统的相关功能。In the solution of this embodiment, the role of the FAC 201 is to send related metadata modification request, and the related function of the original distributed file system can be utilized by itself.
FAS 202本身是一个构建于双层元数据类下层元数据上的功能,通过这个部分,保证在FAS 202上,可以构建一个有效的元数据修改记录的日志部分,保证FAS 202侧的一致性。 FAS 202 itself is a function built on the lower layer metadata of the double-layer metadata class. Through this part, it is guaranteed that on the FAS 202, a valid metadata modification log portion can be constructed to ensure the consistency of the FAS 202 side.
FLR 203构建在双层元数据的上层元数据上,主要保证关于上层元数据层修改之后的日志重放回滚问题。The FLR 203 is built on the upper layer metadata of the double layer metadata, mainly ensuring the log replay rollback problem after the modification of the upper layer metadata layer.
系统中FAC 201、FAS 202及FLR 203之间的交互流程可以如图2所示。The interaction process between the FAC 201, the FAS 202, and the FLR 203 in the system can be as shown in FIG. 2.
首先,FAC 201获取文件数据,推送给FAS 202,用于存储数据。First, the FAC 201 acquires the file data and pushes it to the FAS 202 for storing the data.
FAS 202记录FAC 201推送过来的文件数据,同时在缓冲区里记录下此次FAS 202上元数据的修改,并向所述FAC 201返回文件数据推送完成消息。The FAS 202 records the file data pushed by the FAC 201, records the modification of the metadata on the FAS 202 in the buffer, and returns a file data push completion message to the FAC 201.
此外,FAS 202定期的先于数据将修改的缓冲区刷写入正常的日志文件中。In addition, the FAS 202 periodically writes the modified buffer to the normal log file prior to the data.
FAS 202刷写数据到磁盘之后,将刷写成功的元数据修改完成放入缓冲区,定期刷写入日志文件中。After the FAS 202 flashes the data to the disk, the metadata modification succeeded in the flashing is completed and put into the buffer, and the brush is periodically written into the log file.
其中,FAC 201与FAS 202之间交互以及FAS 202刷写时序可以如图3所示。 The interaction between the FAC 201 and the FAS 202 and the FAS 202 flash write timing can be as shown in FIG. 3.
以FAC 201发送数据a和数据b到FAS 202为例,处理流程如下:Taking the data a and the data b to the FAS 202 sent by the FAC 201 as an example, the processing flow is as follows:
1、FAC 201发送数据a到FAS 202。1. The FAC 201 transmits data a to the FAS 202.
2、FAS 202将修改数据a的通知插入修改缓冲区。2. The FAS 202 inserts a notification to modify the data a into the modification buffer.
3、FAS 202将数据a写入数据缓冲区。3. The FAS 202 writes the data a to the data buffer.
4、FAS 202返回给FAC 201,通知FAC 201,a已经写数据成功。(此时之后就开启了向FLR 203发送元数据修改通知)4. The FAS 202 returns to the FAC 201, notifying the FAC 201 that a has successfully written the data. (After this time, the metadata modification notification is sent to FLR 203)
5、FAC 201发送数据b到FAS 202。5. The FAC 201 sends the data b to the FAS 202.
6、FAS 202将修改数据b的通知插入修改缓冲区。6. The FAS 202 inserts the notification of the modified data b into the modification buffer.
7、FAS 202将数据b写入数据缓冲区。7. The FAS 202 writes the data b to the data buffer.
8、FAS 202返回给FAC 201,通知FAC 201,b已经写数据成功。(步骤5~8代表不同的数据,这里体现出异步通知的速度)8. The FAS 202 returns to the FAC 201, notifying the FAC 201 that b has successfully written the data. (Steps 5-8 represent different data, here is the speed of asynchronous notification)
9、定时日志任务刷写,a和b的修改通知被写入磁盘。9. The timer log task is written, and the modification notices of a and b are written to the disk.
10、a的数据被写入磁盘。10. The data of a is written to disk.
11、a数据写入磁盘的完成通知插入修改缓冲区。11, a data is written to the disk completion notification to insert the modified buffer.
12、b的数据被写入磁盘。12. The data of b is written to disk.
13、b数据写入磁盘的完成通知插入修改缓冲区。13. The completion of the b data write to the disk inserts the modification buffer.
14、定时日志任务刷写,a和b的写入磁盘完成通知,被写入磁盘。14. The timer log task is written, and the write completion notifications of a and b are written to the disk.
此时完整的日志流程被写入,此时FAS 202侧日志系统被完整写入。At this point, the complete log flow is written, and the FAS 202 side log system is completely written.
FAC 201接收到FAS 202返回的文件数据推送完成消息后,向FLR 203发送元数据修改变化请求,在元数据修改变化请求中附带上日志文件系统的相关数据。Upon receiving the file data push completion message returned by the FAS 202, the FAC 201 transmits a metadata modification change request to the FLR 203, and the relevant data of the log file system is attached to the metadata modification change request.
作为一种可选实施方式,FAC 201在向FLR 203发送元数据修改变化请求时,可以采用如下方案:As an optional implementation manner, when the FAC 201 sends a metadata modification change request to the FLR 203, the following scheme may be adopted:
FAC 201接收到所述FAS 202返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区。After receiving the file data push completion message returned by the FAS 202, the FAC 201 fills in the corresponding metadata modification change request into the modify to be notified buffer.
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变 化请求发送至FLR 203。When the set timing time arrives, all metadata modifications in the buffer to be notified will be modified. The request is sent to FLR 203.
以FAC 201向FLR 203发送数据a的元数据修改变化请求、数据b的元数据修改变化请求、数据c的元数据修改变化请求、数据d的元数据修改变化请求为例,FAC 201向FLR 203发送元数据修改变化请求的处理流程可以如图4所示。The FAC 201 sends the metadata modification change request of the data a to the FLR 203, the metadata modification change request of the data b, the metadata modification change request of the data c, and the metadata modification change request of the data d as an example, and the FAC 201 to the FLR 203 The processing flow for sending the metadata modification change request can be as shown in FIG.
1、FAC 201写x文件后将a的修改填入修改待通知缓冲区;1. After the FAC 201 writes the x file, the modification of a is filled in the modified to be notified buffer;
2、FAC 201写x文件后将b的修改填入修改待通知缓冲区;2. After the FAC 201 writes the x file, the modification of b is filled in the modification to be notified buffer;
3、FAC 201写x文件后将c的修改填入修改待通知缓冲区;3. After the FAC 201 writes the x file, the modification of c is filled in the modified to be notified buffer;
4、FAC 201写y文件后将d的修改填入修改待通知缓冲区。4. After the FAC 201 writes the y file, the modification of d is filled in the modification to be notified buffer.
此时是检测时间已经达到要求的时间区间,同时定时器还没有触发,则触发发送元数据同步消息给FLR 203,同时重新设置定时器。At this time, when the detection time has reached the required time interval, and the timer has not been triggered, the metadata synchronization message is triggered to be sent to the FLR 203, and the timer is reset.
当一段时间后,定时器触发,将待通知缓冲区内的消息,通知到FLR203并重行设置定时器。此种处理方式,可以大大的减轻对于FLR 203主控消息的数量,同时在短小的时间间隔内又可以尽可能的保持实时性。After a period of time, the timer is triggered, and the message in the buffer to be notified is notified to the FLR 203 and the timer is reset. This kind of processing can greatly reduce the number of master messages for the FLR 203, and at the same time keep the real-time performance as much as possible in a short time interval.
FLR 203收到元数据修改变化后修改相应元数据,并通过附加日志相关数据记录,将相关元数据修改到日志系统中。与此同时,FAS 202刷写数据入磁盘,在确定写入成功后刷写日志。After receiving the metadata modification, the FLR 203 modifies the corresponding metadata, and modifies the relevant metadata into the log system by attaching the log related data record. At the same time, the FAS 202 flashes the data into the disk and writes the log after determining that the write is successful.
另外,FLR 203按照时间的顺序,将相关处理的条目加入对应的FAS202的缓冲区。In addition, the FLR 203 adds the relevant processed entries to the buffer of the corresponding FAS 202 in chronological order.
当监测到所述FAS 202异常重启时,所述FLR 203根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。When it is detected that the FAS 202 is abnormally restarted, the FLR 203 performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
FLR 203通过接收FAS 202定期发送的心跳报文来监测FAS 202是否异常。The FLR 203 monitors whether the FAS 202 is abnormal by receiving a heartbeat message periodically sent by the FAS 202.
FAS 202定期发送still alive消息,以表明FAS 202依然在工作。The FAS 202 periodically sends a still alive message to indicate that the FAS 202 is still working.
当监测到来自FAS 202的心跳报文时,判定所述FAS 202正常,当监测到连续多次丢失心跳报文时,判定所述FAS 202异常。When the heartbeat message from the FAS 202 is detected, it is determined that the FAS 202 is normal, and when the consecutively lost heartbeat messages are detected, it is determined that the FAS 202 is abnormal.
对于FAS 202发送来的心跳报文,FLR 203不做处理,但是如果出现某 个连续丢失心跳报文的情况,FLR 203就需要对丢失心跳报文的FAS 202做滞后处理,保证如果是真实的FAS 202宕机复位,将做相关操作的回滚动作。For heartbeat messages sent by FAS 202, FLR 203 does not process, but if some In the case of continuous loss of heartbeat messages, the FLR 203 needs to lag the FAS 202 of the lost heartbeat message to ensure that if the real FAS 202 is down, it will do the scrolling of the related operations.
当监测到所述FAS 202异常重启时,所述FLR 203根据日志记录,进行回滚操作,即将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述FAS 202的所有修改记录,即FAC上报的数据修改变化。When it is detected that the FAS 202 is abnormally restarted, the FLR 203 performs a rollback operation according to the log record, that is, the modified data of the log record is retracted from the current time point of the log record for a set time length, the set time The modified data of the length corresponds to all the modified records of the FAS 202, that is, the data modification changes reported by the FAC.
当所述FAS 202上电时,发送回滚请求到FLR 203以回滚相应的数据;FLR 203根据所述回滚请求回滚相应的数据至对应的FAS 202的缓冲区,完成日志文件系统的修复。When the FAS 202 is powered on, a rollback request is sent to the FLR 203 to roll back the corresponding data; the FLR 203 rolls back the corresponding data to the buffer of the corresponding FAS 202 according to the rollback request, and completes the log file system. repair.
本实施例中FLR 203的处理流程可以如图5所示。The processing flow of the FLR 203 in this embodiment can be as shown in FIG. 5.
当其中的一台FAS 202异常宕机重启的情况下,日志系统进入修复流程。流程首先于FLR 203上触发,当FLR 203确认一台FAS 202重启了,日志系统将通过FLR 203上的日志记录回滚特定时间长度对应于这台FAS 202的所有修改记录。同时当这台FAS 202上电时,通过FAS 202本地记录的日志,回滚那些写入FAS 202但是没有写入磁盘的相关数据,发送回滚请求到FLR 203以回滚相应的数据。When one of the FAS 202s is abnormally restarted, the log system enters the repair process. The flow is first triggered on the FLR 203. When the FLR 203 confirms that a FAS 202 has been restarted, the log system will roll back through the log records on the FLR 203 all the modification records corresponding to the FAS 202 for a specific length of time. At the same time, when the FAS 202 is powered on, through the log recorded locally by the FAS 202, the related data written to the FAS 202 but not written to the disk is rolled back, and a rollback request is sent to the FLR 203 to roll back the corresponding data.
当两个流程运行完成,修复流程顺利完成,同时系统在修复流程中,通过其它副本的存在依然提供一致性的数据,达到对用户的不可见。When the two processes are completed, the repair process is successfully completed, and the system still provides consistent data through the existence of other replicas in the repair process, which is invisible to the user.
本实施例的系统可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。The system of this embodiment can provide all the characteristics of the lagging log file system without reducing the response of the file system, and ensure high consistency of the file after the system resets and restarts.
相比相关技术,本施例方案中,FAC 201获取文件数据,推送给FAS202;FAS 202记录FAC 201推送过来的文件数据,在缓冲区记录下此次FAS202上对应的元数据的修改,写入日志文件,并向所述FAC 201返回文件数据推送完成消息;FAC 201接收到所述FAS 202返回的文件数据推送完成消息后,向FLR 203发送元数据修改变化请求;FLR 203根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述FAS 202异常重启时,FLR 203根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。Compared with the related art, in the solution of this embodiment, the FAC 201 acquires the file data and sends it to the FAS 202; the FAS 202 records the file data pushed by the FAC 201, and records the modification and writing of the corresponding metadata on the FAS 202 in the buffer. a log file, and returning a file data push completion message to the FAC 201; after receiving the file data push completion message returned by the FAS 202, the FAC 201 sends a metadata modification change request to the FLR 203; the FLR 203 is based on the metadata Modify the change request, modify the corresponding metadata, and record to the log file system; when the FAS is detected When the abnormal restart is performed, the FLR 203 performs the rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures the final high consistency of the file after the reset file system is reset and restarted, thereby avoiding the machine restarting. The inconsistency of data between multiple copies, while minimizing the corresponding delay and performance loss due to the addition of the log system.
本发明实施例中日志系统对于分布式系统的规模没有敏感性与相关性,对系统压力是常量,不会因为集群的扩大而增大日志系统的压力。具有良好的收敛性,同时没有网络上的额外开销。对于日志系统所在磁盘压力极小,是一种以较高错杀率为代价的高性能,低延迟的日志文件系统。In the embodiment of the present invention, the log system has no sensitivity and correlation to the scale of the distributed system, and the system pressure is constant, and the pressure of the log system is not increased due to the expansion of the cluster. Has good convergence and no overhead on the network. The pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
工业实用性Industrial applicability
本发明实施例保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启带来的多副本间数据的不一致性,且最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。 The embodiment of the invention ensures the final high consistency of the file after the reset file system is reset and restarted, avoids the data inconsistency between multiple copies caused by the machine restart, and minimizes the corresponding increase of the log system. The delay and performance loss.

Claims (11)

  1. 一种分布式文件系统的数据处理方法,包括:A data processing method for a distributed file system, comprising:
    文件服务客户端FAC获取文件数据,推送给文件数据服务器FAS;The file service client FAC obtains the file data and sends it to the file data server FAS;
    所述FAS记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;The FAS records the file data pushed by the FAC, records the modification of the corresponding metadata on the FAS in the buffer, writes the log file, and returns a file data push completion message to the FAC;
    所述FAC接收到所述FAS返回的文件数据推送完成消息后,向文件位置寄存器FLR发送元数据修改变化请求;After receiving the file data push completion message returned by the FAS, the FAC sends a metadata modification change request to the file location register FLR;
    所述FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;The FLR modifies the change request according to the metadata, modifies the corresponding metadata, and records the file to the log file system;
    当监测到所述FAS异常重启时,所述FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。When the abnormal restart of the FAS is detected, the FLR performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  2. 根据权利要求1所述的方法,所述FLR根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统的步骤中还包括:The method according to claim 1, wherein the step of modifying the change request according to the metadata, modifying the corresponding metadata, and recording to the log file system further includes:
    所述FLR按照时间的顺序,将相关处理的条目加入对应的FAS的缓冲区。The FLR adds the relevant processed entries to the buffer of the corresponding FAS in chronological order.
  3. 根据权利要求1所述的方法,其中,所述当监测到FAS异常重启时,所述FLR根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复的步骤包括:The method according to claim 1, wherein the FLR performs a rollback operation of the corresponding modified data according to the log record when the abnormal restart of the FAS is detected, and the step of completing the repair of the log file system includes:
    当监测到所述FAS异常重启时,所述FLR根据日志记录,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述FAS的所有修改记录;When the abnormal restart of the FAS is detected, the FLR returns the modified data of the log record from the current time point of the log record according to the log record, and the modified data of the set time length corresponds to the All changes to the FAS record;
    当所述FAS上电时,发送回滚请求到FLR以回滚相应的数据;Sending a rollback request to the FLR to roll back the corresponding data when the FAS is powered on;
    所述FLR根据所述回滚请求回滚相应的数据至对应的FAS的缓冲区,完成日志文件系统的修复。The FLR rolls back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and completes the repair of the log file system.
  4. 根据权利要求1、2或3所述的方法,其中,所述FLR监测FAS异常 的步骤包括:The method of claim 1, 2 or 3 wherein said FLR monitors FAS anomalies The steps include:
    所述FLR接收所述FAS定期发送的心跳报文;Receiving, by the FLR, a heartbeat message periodically sent by the FAS;
    当监测到连续多次丢失心跳报文时,判定所述FAS异常。When it is detected that the heartbeat message is lost multiple times in succession, it is determined that the FAS is abnormal.
  5. 根据权利要求4所述的方法,其中,所述FAC接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求的步骤包括:The method according to claim 4, wherein the step of the FAC transmitting the metadata modification change request to the FLR after receiving the file data push completion message returned by the FAS comprises:
    所述FAC接收到所述FAS返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;After receiving the file data push completion message returned by the FAS, the FAC fills in the corresponding metadata modification change request into the modify to be notified buffer;
    当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至FLR。When the set timing time arrives, all metadata modification change requests in the modify notification buffer are sent to the FLR.
  6. 一种分布式文件系统,包括:文件服务客户端FAC、文件数据服务器FAS及文件位置寄存器FLR,其中:A distributed file system comprising: a file service client FAC, a file data server FAS, and a file location register FLR, wherein:
    所述FAC,设置为:获取文件数据,推送给FAS;The FAC is set to: obtain file data, and push it to the FAS;
    所述FAS,设置为:记录FAC推送过来的文件数据,在缓冲区记录下此次FAS上对应的元数据的修改,写入日志文件,并向所述FAC返回文件数据推送完成消息;The FAS is configured to: record file data pushed by the FAC, record the modification of the corresponding metadata on the FAS in the buffer, write the log file, and return a file data push completion message to the FAC;
    所述FAC,还设置为:接收到所述FAS返回的文件数据推送完成消息后,向FLR发送元数据修改变化请求;The FAC is further configured to: after receiving the file data push completion message returned by the FAS, send a metadata modification change request to the FLR;
    所述FLR,设置为:根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;The FLR is configured to: modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
    所述FLR,还设置为:当监测到所述FAS异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。The FLR is further configured to: when the abnormal restart of the FAS is detected, perform a rollback operation of the corresponding modified data according to the log record, and complete the repair of the log file system.
  7. 根据权利要求6所述的系统,其中,The system of claim 6 wherein
    所述FLR,还设置为:按照时间的顺序,将相关处理的条目加入对应的FAS的缓冲区。The FLR is further configured to add the related processed entries to the buffer of the corresponding FAS in order of time.
  8. 根据权利要求6所述的系统,其中,The system of claim 6 wherein
    所述FLR,是设置为:当监测到所述FAS异常重启时,根据日志记录, 将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述FAS的所有修改记录;The FLR is set to: when the abnormal restart of the FAS is detected, according to the log record, And setting the modified data of the log record to the set time length from the current time point of the log record, where the modified data of the set time length corresponds to all the modified records of the FAS;
    所述FAS,是设置为:当所述FAS上电时,发送回滚请求到FLR以回滚相应的数据;The FAS is configured to: when the FAS is powered on, send a rollback request to the FLR to roll back the corresponding data;
    所述FLR,是设置为:根据所述回滚请求回滚相应的数据至对应的FAS的缓冲区,完成日志文件系统的修复。The FLR is configured to: roll back the corresponding data to the buffer of the corresponding FAS according to the rollback request, and complete the repair of the log file system.
  9. 根据权利要求6、7或8所述的系统,其中,A system according to claim 6, 7 or 8, wherein
    所述FLR,是设置为:接收所述FAS定期发送的心跳报文;当监测到连续多次丢失心跳报文时,判定所述FAS异常。The FLR is configured to: receive a heartbeat message periodically sent by the FAS; and determine that the FAS is abnormal when a continuous lost heartbeat message is detected.
  10. 根据权利要求9所述的系统,其中,The system of claim 9 wherein
    所述FAC,是设置为:接收到所述FAS返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至FLR。The FAC is configured to: after receiving the file data push completion message returned by the FAS, fill the corresponding metadata modification change request into the modify to be notified buffer; when the set timing time arrives, the modification is to be performed. All metadata modification change requests within the notification buffer are sent to the FLR.
  11. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-5任一项的方法。 A computer readable storage medium storing computer executable instructions for performing the method of any of claims 1-5.
PCT/CN2015/076473 2014-10-24 2015-04-13 Data processing method for distributed file system and distributed file system WO2015184925A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410578968.2 2014-10-24
CN201410578968.2A CN105589887B (en) 2014-10-24 2014-10-24 Data processing method of distributed file system and distributed file system

Publications (1)

Publication Number Publication Date
WO2015184925A1 true WO2015184925A1 (en) 2015-12-10

Family

ID=54766145

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2015/072772 WO2016061956A1 (en) 2014-10-24 2015-02-11 Data processing method for distributed file system and distributed file system
PCT/CN2015/076473 WO2015184925A1 (en) 2014-10-24 2015-04-13 Data processing method for distributed file system and distributed file system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072772 WO2016061956A1 (en) 2014-10-24 2015-02-11 Data processing method for distributed file system and distributed file system

Country Status (2)

Country Link
CN (1) CN105589887B (en)
WO (2) WO2016061956A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143126A (en) * 2019-12-20 2020-05-12 浪潮电子信息产业股份有限公司 Data copying method, system and related components of distributed file system
CN114504828A (en) * 2022-02-08 2022-05-17 北京趣玩天橙科技有限公司 Method and system for realizing memory consistency through data rollback
CN117950597A (en) * 2024-03-22 2024-04-30 浙江大华技术股份有限公司 Data modification writing method, data modification writing device, and computer storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021562B (en) * 2016-10-31 2022-11-18 中兴通讯股份有限公司 Disk storage method and device applied to distributed file system and distributed file system
CN106599046B (en) * 2016-11-09 2020-06-30 北京同有飞骥科技股份有限公司 Writing method and device of distributed file system
CN109284066B (en) * 2017-07-19 2022-09-30 阿里巴巴集团控股有限公司 Data processing method, device, equipment and system
CN109117093B (en) * 2018-08-20 2021-10-01 赛凡信息科技(厦门)有限公司 Method for ensuring consistency of data, flow and capacity in distributed object storage
CN111522688B (en) * 2019-02-01 2023-09-15 阿里巴巴集团控股有限公司 Data backup method and device for distributed system
CN110096358A (en) * 2019-04-11 2019-08-06 上海交通大学 Chain drive remote center distributed storage and distributed computing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7681072B1 (en) * 2004-08-13 2010-03-16 Panasas, Inc. Systems and methods for facilitating file reconstruction and restoration in data storage systems where a RAID-X format is implemented at a file level within a plurality of storage devices
CN102024016A (en) * 2010-11-04 2011-04-20 天津曙光计算机产业有限公司 Rapid data restoration method for distributed file system (DFS)
CN103077222A (en) * 2012-12-31 2013-05-01 中国科学院计算技术研究所 Method and system for ensuring consistence of distributed metadata in cluster file system
CN103279568A (en) * 2013-06-18 2013-09-04 无锡紫光存储系统有限公司 System and method for metadata management
US20130332418A1 (en) * 2012-06-08 2013-12-12 Electronics And Telecommunications Research Institute Method of managing data in asymmetric cluster file system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762642B2 (en) * 2009-01-30 2014-06-24 Twinstrata Inc System and method for secure and reliable multi-cloud data replication
CN101916215B (en) * 2010-08-09 2012-02-01 哈尔滨工程大学 Operation intercept based repentance method of distributed critical task system
CN102833273B (en) * 2011-06-13 2017-11-03 中兴通讯股份有限公司 Data recovery method and distributed cache system during temporary derangement
CN102368267A (en) * 2011-10-25 2012-03-07 曙光信息产业(北京)有限公司 Method for keeping consistency of copies in distributed system
CN102662795A (en) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 Metadata fault-tolerant recovery method in distributed storage system
CN102890716B (en) * 2012-09-29 2017-08-08 南京中兴新软件有限责任公司 The data back up method of distributed file system and distributed file system
CN103051681B (en) * 2012-12-06 2015-06-17 华中科技大学 Collaborative type log system facing to distribution-type file system
CN103198159B (en) * 2013-04-27 2016-01-06 国家计算机网络与信息安全管理中心 A kind of many copy consistency maintaining methods of isomeric group reformed based on affairs
CN103297268B (en) * 2013-05-13 2016-04-06 北京邮电大学 Based on the distributed data consistency maintenance system and method for P2P technology
CN103294787A (en) * 2013-05-21 2013-09-11 成都市欧冠信息技术有限责任公司 Multi-copy storage method and multi-copy storage system for distributed database system
CN103729436A (en) * 2013-12-27 2014-04-16 中国科学院信息工程研究所 Distributed metadata management method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7681072B1 (en) * 2004-08-13 2010-03-16 Panasas, Inc. Systems and methods for facilitating file reconstruction and restoration in data storage systems where a RAID-X format is implemented at a file level within a plurality of storage devices
CN102024016A (en) * 2010-11-04 2011-04-20 天津曙光计算机产业有限公司 Rapid data restoration method for distributed file system (DFS)
US20130332418A1 (en) * 2012-06-08 2013-12-12 Electronics And Telecommunications Research Institute Method of managing data in asymmetric cluster file system
CN103077222A (en) * 2012-12-31 2013-05-01 中国科学院计算技术研究所 Method and system for ensuring consistence of distributed metadata in cluster file system
CN103279568A (en) * 2013-06-18 2013-09-04 无锡紫光存储系统有限公司 System and method for metadata management

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143126A (en) * 2019-12-20 2020-05-12 浪潮电子信息产业股份有限公司 Data copying method, system and related components of distributed file system
CN114504828A (en) * 2022-02-08 2022-05-17 北京趣玩天橙科技有限公司 Method and system for realizing memory consistency through data rollback
CN114504828B (en) * 2022-02-08 2023-04-28 北京趣玩天橙科技有限公司 Method and system for realizing memory consistency by data rollback
CN117950597A (en) * 2024-03-22 2024-04-30 浙江大华技术股份有限公司 Data modification writing method, data modification writing device, and computer storage medium

Also Published As

Publication number Publication date
CN105589887A (en) 2016-05-18
WO2016061956A1 (en) 2016-04-28
CN105589887B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
WO2015184925A1 (en) Data processing method for distributed file system and distributed file system
US11068503B2 (en) File system operation handling during cutover and steady state
US9875042B1 (en) Asynchronous replication
JP6404907B2 (en) Efficient read replica
US9727262B2 (en) Low overhead resynchronization snapshot creation and utilization
US10067694B1 (en) Replication ordering
US10831741B2 (en) Log-shipping data replication with early log record fetching
US10565071B2 (en) Smart data replication recoverer
US10223007B1 (en) Predicting IO
US10365978B1 (en) Synchronization of snapshots in a distributed consistency group
US9910592B2 (en) System and method for replicating data stored on non-volatile storage media using a volatile memory as a memory buffer
WO2021226905A1 (en) Data storage method and system, and storage medium
WO2018098972A1 (en) Log recovery method, storage device and storage node
WO2019020081A1 (en) Distributed system and fault recovery method and apparatus thereof, product, and storage medium
CN103516736A (en) Data recovery method of distributed cache system and a data recovery device of distributed cache system
US9053073B1 (en) Use of timestamp logic in synchronous replication
US10990312B2 (en) Method, apparatus, device and storage medium for processing data location of storage device
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
US12045137B2 (en) Data backup method, apparatus, and system
WO2017014814A1 (en) Replicating memory volumes
WO2019109256A1 (en) Log management method, server and database system
WO2022033269A1 (en) Data processing method, device and system
CN107402841B (en) Data restoration method and device for large-scale distributed file system
CN111382024A (en) Method, device and system for monitoring master-slave copying delay of database
CN104991739A (en) Method and system for refining primary execution semantics during metadata server failure substitution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15803882

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15803882

Country of ref document: EP

Kind code of ref document: A1