WO2016061956A1 - 分布式文件系统的数据处理方法及分布式文件系统 - Google Patents

分布式文件系统的数据处理方法及分布式文件系统 Download PDF

Info

Publication number
WO2016061956A1
WO2016061956A1 PCT/CN2015/072772 CN2015072772W WO2016061956A1 WO 2016061956 A1 WO2016061956 A1 WO 2016061956A1 CN 2015072772 W CN2015072772 W CN 2015072772W WO 2016061956 A1 WO2016061956 A1 WO 2016061956A1
Authority
WO
WIPO (PCT)
Prior art keywords
fas
flr
data
file
metadata
Prior art date
Application number
PCT/CN2015/072772
Other languages
English (en)
French (fr)
Inventor
朱鹏
林健
胡剑华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016061956A1 publication Critical patent/WO2016061956A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of distributed file storage technologies, and in particular, to a data processing method and a distributed file system of a distributed file system.
  • the distributed file system can provide high throughput rate, can provide several times the throughput rate of the common local file system, and provide high reliability. Through multiple copies and redundant copy technology, the reliability of data in the case of abnormal single machine can be improved. For devices such as magnetic arrays, there are advantages of being inexpensive and versatile.
  • the main purpose of the embodiments of the present invention is to provide a data processing method for a distributed file system and a distributed file system, which avoids data inconsistency between multiple copies caused by the restart of the Fas.
  • an embodiment of the present invention provides a data processing method for a distributed file system, including:
  • Fac gets the file data and pushes it to Fas
  • the Fas records the file data pushed by the Fac, and records the Fas on the buffer. Modifying the metadata, writing the log file, and returning the file data push completion message to the Fac;
  • the Fac After receiving the file data push completion message returned by the Fas, the Fac sends a metadata modification change request to the Flr;
  • the F1r modifies the change request according to the metadata, modifies the corresponding metadata, and records the file to the log file system;
  • the Flr When the abnormal restart of the Fas is detected, the Flr performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • the step of the Flr modifying the change request according to the metadata, modifying the corresponding metadata, and recording to the log file system further includes:
  • the Flr adds the related processed entries to the buffer of the corresponding Fas in order of time.
  • the Flr when the abnormal restart of the Fas is detected, the Flr performs a rollback operation of the corresponding modified data according to the log record, and the steps of completing the repair of the log file system include:
  • the Flr When the abnormal restart of the Fas is detected, the Flr returns the modified data of the log record from the current time point of the log record according to the log record, and the modified data of the set time length corresponds to the Record all changes to Fas;
  • the Flr rolls back the corresponding data to the buffer of the corresponding Fas according to the rollback request, and completes the repair of the log file system.
  • the step of monitoring the Fas abnormality by the Flr comprises:
  • the Fas is determined to be abnormal.
  • the step of sending a metadata modification change request to the Flr includes:
  • the Fac After receiving the file data push completion message returned by the Fas, the Fac will send the corresponding element.
  • the data modification change request is filled in the modification pending notification buffer;
  • the embodiment of the invention further provides a distributed file system, including: Fac, Fas and Flr, wherein:
  • the Fac is used to obtain file data and push it to Fas;
  • the Fas is used to record the file data pushed by the Fac, record the modification of the corresponding metadata on the Fas in the buffer, write the log file, and return the file data push completion message to the Fac;
  • the Fac is further configured to: after receiving the file data push completion message returned by the Fas, send a metadata modification change request to the Flr;
  • the Flr is configured to modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
  • the Flr is further configured to perform a rollback operation of the corresponding modified data according to the log record when the abnormal restart of the Fas is detected, and complete the repair of the log file system.
  • the Flr is further configured to add the related processed entries to the buffer of the corresponding Fas in order of time.
  • the Flr is further configured to, when the Fas abnormal restart is detected, roll back the modified data of the log record from the current time point of the log record for a set time length, and the modified data of the set time length Corresponding to all modification records of the Fas;
  • the Fas is further configured to send a rollback request to the Flr to roll back the corresponding data when the Fas is powered on;
  • the Flr is further configured to roll back the corresponding data to the buffer of the corresponding Fas according to the rollback request, and complete the repair of the log file system.
  • the Flr is further configured to receive a heartbeat message periodically sent by the Fas; when monitoring When the heartbeat message is lost several times in succession, it is determined that the Fas is abnormal.
  • the Fac is further configured to: after receiving the file data push completion message returned by the Fas, fill the corresponding metadata modification change request into the modify to be notified buffer; when the set timing time arrives, Modify all metadata modification change requests in the buffer to be notified to Flr.
  • the Fac obtains the file data, and pushes it to Fas; the Fas records the file data pushed by the Fac, and records the corresponding Fas in the buffer. Modifying the metadata, writing the log file, and returning the file data push completion message to the Fac; after receiving the file data push completion message returned by the Fas, the Fac sends a metadata modification change request to the Flr; Describe the metadata modification request, modify the corresponding metadata, and record to the log file system; when it is detected that the Fas is abnormally restarted, Flr performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system. , to ensure the final high consistency of the file after the reset file system reset and restart, to avoid the data inconsistency between multiple copies caused by the machine downtime, and to minimize the delay caused by the addition of the log system. And performance loss.
  • FIG. 1 is a schematic flow chart of an embodiment of a data processing method of a distributed file system according to the present invention
  • FIG. 2 is a schematic diagram of an interaction process between Fac, Fas, and Flr according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of the interaction between Fac and Fas and the timing of Fas brush writing according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a specific process of sending a metadata modification change request to F1r according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a process flow of an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a distributed file system according to the present invention.
  • the solution of the embodiment of the present invention is mainly: Fac obtains file data and pushes it to Fas; Fas records file data pushed by Fac, records the corresponding metadata of the Fas in the buffer, and writes the log file, and Returning a file data push completion message to the Fac; after receiving the file data push completion message returned by the Fas, the Fac sends a metadata modification change request to the Flr; the Flr modifies the change request according to the metadata, and modifies the corresponding metadata.
  • an embodiment of the present invention provides a data processing method for a distributed file system, including:
  • Step S101 Fac acquires file data and pushes it to Fas
  • the system operating environment involved in the method embodiment of the present invention includes: Fac, Fas, and Flr, wherein:
  • Fac A file service client that provides a connection between the user and the internal data of the distributed file system.
  • Fas File data server, used to store the actual data of the file.
  • Flr File location register, used to store information about the metadata corresponding to the file and data.
  • the solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
  • the ext2 file system is a general file system. It does not have the function of the log file system. It is likely to lose some of the write or modify during the reset and power off process. Data, resulting in inconsistency between metadata and data. To solve this problem, the ext3 file system has been improved, and the function of the log system has been added. When the power is turned on, the consistency of the file system is corrected by replaying the log portion.
  • the double-layer metadata related to the embodiment refers to: the components corresponding to the metadata on the Flr and the Fas, the corresponding information of the file segmentation data location name on the Flr, and the slice name and the information on the Fas. Corresponding information of the actual disk block.
  • a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
  • the function of the Fac is to send related metadata modification request, and the related function of the original distributed file system can be utilized by itself.
  • Fas itself is a function built on the lower metadata of the double-layer metadata class. Through this part, it is guaranteed that on Fas, a valid metadata modification log part can be constructed to ensure the consistency of the Fas side.
  • Flr is built on the upper layer metadata of the double-layer metadata, mainly to ensure the log replay rollback problem after the modification of the upper metadata layer.
  • Fac acquires file data and pushes it to Fas for storing data.
  • Step S102 the Fas records the file data pushed by the Fac, records the modification of the corresponding metadata on the Fas in the buffer, writes the log file, and returns the number of files to the Fac. According to the push completion message;
  • Fas records the file data pushed by the Fac, and records the modification of the metadata on the Fas in the buffer, and returns a file data completion message to the Fac.
  • Fas periodically writes the modified buffer brush to the normal log file before the data.
  • the modified metadata is successfully written into the buffer and periodically written into the log file.
  • the interaction between Fac and Fas and the Fas brush writing timing can be as shown in FIG. 3.
  • Fac sends data a to Fas.
  • Fac sends data b to Fas.
  • Steps 5-8 represent different data, here is the speed of asynchronous notification
  • the timer log task is written, and the modification notices of a and b are written to the disk.
  • the data of b is written to disk.
  • the timer log task is written, and the write completion notifications of a and b are written to the disk.
  • Step S103 after receiving the file data push completion message returned by the Fas, the Fac sends a metadata modification change request to the Flr;
  • the Fac After receiving the file data push completion message returned by Fas, the Fac sends a metadata modification change request to the Flr, and the relevant data of the log file system is attached to the metadata modification change request.
  • the Fac After receiving the file data push completion message returned by the Fas, the Fac fills the corresponding metadata modification change request into the modify to be notified buffer.
  • the Fac sends the metadata modification to the Flr.
  • the specific processing flow of the change request can be as shown in FIG. 4.
  • the metadata synchronization message is triggered to be sent to Flr, and the timer is reset.
  • Step S104 the Flr modifies the change request according to the metadata, modifies the corresponding metadata, and records the file to the log file system;
  • Flr changes the corresponding metadata after receiving the metadata modification, and attaches the log related data. Record, modify the relevant metadata to the log system. At the same time, Fas flashes the data into the disk and writes the log after determining that the write is successful.
  • Flr adds the relevant processed entries to the buffer of the corresponding Fas in the order of time.
  • step S105 when the abnormal restart of the Fas is detected, the Flr performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • Flr monitors whether Fas is abnormal by receiving a heartbeat message periodically sent by Fas.
  • Fas periodically sends a still alive message to indicate that Fas is still working.
  • the Fas When the heartbeat message from Fas is detected, it is determined that the Fas is normal, and when the heartbeat message is lost several times in succession, the Fas is determined to be abnormal.
  • the Flr performs a rollback operation according to the log record, that is, the modified data of the log record is forwarded back to a specific length of time from the current time point, and the modification of the specific time length is performed.
  • the data corresponds to all the modification records of the Fas, that is, the data modification changes reported by the Fac.
  • the rollback request is sent to the Flr to roll back the corresponding data; the Flr rolls back the corresponding data to the buffer of the corresponding Fas according to the rollback request, and completes the repair of the log file system.
  • the processing flow of Flr in this embodiment can be as shown in FIG. 5.
  • the log system When one of the Fas is abnormally restarted, the log system enters the repair process. The process is first triggered on Flr. When Flr confirms that a Fas is restarted, the log system will roll back all the modification records corresponding to this Fas for a certain length of time through the log record on Flr. At the same time, when this Fas is powered on, the logs recorded by Fas are locally rolled back, and those written to Fas are not written but not written. The relevant data of the disk, send a rollback request to Flr to roll back the corresponding data.
  • the system can provide all the characteristics of the lagging log file system without reducing the response of the file system, ensuring high consistency of files after system reset and restart.
  • the Fac obtains the file data and pushes it to Fas; the Fas records the file data pushed by the Fac, records the corresponding metadata modification of the Fas in the buffer, and writes the log file. And returning a file data push completion message to the Fac; after receiving the file data push completion message returned by the Fas, the Fac sends a metadata modification change request to the Flr; the Flr modifies the change request according to the metadata, and modifies the corresponding
  • the metadata is recorded to the log file system; when it is detected that the Fas is abnormally restarted, Flr performs a rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures that the distributed file system is reset and restarted.
  • the final high consistency of the file avoids the inconsistency of data between multiple copies caused by machine downtime, while minimizing the corresponding delay and performance loss due to the addition of the log system.
  • This log system has no sensitivity and correlation to the scale of the distributed system.
  • the system pressure is constant, and the pressure of the log system will not increase due to the expansion of the cluster. Has good convergence and no overhead on the network.
  • the pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
  • an embodiment of the present invention provides a distributed file system, including: Fac 201, Fas 202, and Flr 203, where:
  • the Fac 201 is configured to acquire file data and push it to Fas 202;
  • the Fas 202 is configured to record the file data pushed by the Fac 201, record the modification of the corresponding metadata on the Fas 202 in the buffer, write the log file, and write the log file to the Fac 201. Returning the file data push completion message;
  • the Fac 201 is further configured to: after receiving the file data push completion message returned by the Fas 202, send a metadata modification change request to the Flr 203;
  • the Flr 203 is configured to modify the change request according to the metadata, modify the corresponding metadata, and record to the log file system;
  • the Flr 203 is further configured to perform a rollback operation of the corresponding modified data according to the log record when the abnormal restart of the Fas 202 is detected, and complete the repair of the log file system.
  • Fac 201 a file service client, configured to provide a connection between the user and the internal data of the distributed file system.
  • Fas 202 File data server, used to store the actual data of the file.
  • Flr 203 A file location register for storing information related to metadata such as files and data.
  • the solution of the embodiment provides a log file system mode for the double-layer metadata, which can provide all the characteristics of the lagging log file system without restoring the response of the file system, and ensure the system reset after restarting. High consistency of the file.
  • the ext2 file system is a general file system. It does not have the function of the log file system. It is likely to lose some of the write or modify during the reset and power off process. Data, resulting in inconsistency between metadata and data. To solve this problem, the ext3 file system has been improved, and the function of the log system has been added. When the power is turned on, the consistency of the file system is corrected by replaying the log portion.
  • the double-layer metadata involved in the embodiment refers to: the components corresponding to the metadata on the Flr 203 and the Fas 202, and the information on the location information of the file segmentation data on the Flr 203, which is stored on the Fas 202.
  • a distributed file system with management metadata built on top of the local file system falls into the category of such a two-tier metadata distributed file system.
  • the function of the Fac 201 is to send related metadata modification request, and the related function of the original distributed file system can be utilized by itself.
  • Fas 202 itself is a function built on the lower metadata of the double-layer metadata class. Through this part, it is guaranteed that on Fas 202, a valid metadata modification log portion can be constructed to ensure the consistency of the Fas 202 side.
  • Flr 203 is built on the upper layer metadata of the double-layer metadata, mainly to ensure the log replay rollback problem after the modification of the upper metadata layer.
  • the interaction process between Fac 201, Fas 202 and Flr 203 in the system can be as shown in FIG. 2 .
  • Fac 201 acquires file data and pushes it to Fas 202 for storing data.
  • the Fas 202 records the file data pushed by the Fac 201, and records the modification of the metadata on the Fas 202 in the buffer, and returns a file data push completion message to the Fac 201.
  • Fas 202 periodically writes modified buffers to normal log files prior to data.
  • the Fas 202 flashes the data to the disk
  • the modified metadata is successfully written into the buffer and periodically written into the log file.
  • the interaction between the Fac 201 and the Fas 202 and the Fas 202 flash write timing can be as shown in FIG. 3.
  • Fac 201 sends data a to Fas 202.
  • Fas 202 inserts the notification of modifying data a into the modification buffer.
  • Fas 202 writes data a to the data buffer.
  • Fas 202 returns to Fac 201, notifying Fac 201 that a has successfully written data. (After this time, the metadata modification notification is sent to Flr 203)
  • Fac 201 sends data b to Fas 202.
  • Fas 202 inserts the notification of the modified data b into the modification buffer.
  • Fas 202 writes data b to the data buffer.
  • Fas 202 returns to Fac 201, notifying Fac 201 that b has successfully written the data. (Steps 5-8 represent different data, here is the speed of asynchronous notification)
  • the timer log task is written, and the modification notices of a and b are written to the disk.
  • the data of b is written to disk.
  • the timer log task is written, and the write completion notifications of a and b are written to the disk.
  • the Fac 201 After receiving the file data push completion message returned by Fas 202, the Fac 201 sends a metadata modification change request to the Flr 203, and the relevant data of the log file system is attached to the metadata modification change request.
  • the Fac 201 After receiving the file data push completion message returned by the Fas 202, the Fac 201 fills in the corresponding metadata modification change request into the modify to be notified buffer.
  • the Fat 201 sends the metadata modification change request of the data a to the Flr 203, the metadata modification change request of the data b, the metadata modification change request of the data c, and the metadata modification change request of the data d, for example, the Fac 201 to the Flr 203
  • the specific processing flow for sending the metadata modification change request can be as shown in FIG. 4.
  • the metadata synchronization message is triggered to be sent to the Flr 203, and the timer is reset.
  • the Flr 203 modifies the corresponding metadata, and modifies the relevant metadata into the log system by attaching the log related data record.
  • Fas 202 flashes the data into the disk and writes the log after determining that the write is successful.
  • the Flr 203 adds the related processed entries to the buffer of the corresponding Fas 202 in chronological order.
  • the Flr 203 When the abnormal restart of the Fas 202 is detected, the Flr 203 performs a rollback operation of the corresponding modified data according to the log record, and completes the repair of the log file system.
  • the Flr 203 monitors whether the Fas 202 is abnormal by receiving a heartbeat message periodically sent by the Fas 202.
  • the Fas 202 periodically sends a still alive message to indicate that the Fas 202 is still working.
  • the Fas 202 When the heartbeat message from the Fas 202 is detected, it is determined that the Fas 202 is normal, and when the heartbeat message is lost several times in succession, the Fas 202 is determined to be abnormal.
  • Flr 203 For the heartbeat message sent by Fas 202, Flr 203 does not process, but if some In the case of continuous loss of heartbeat messages, Flr 203 needs to lag the Fas 202 of the lost heartbeat message to ensure that if it is a real Fas 202 downtime reset, it will do the scrollback of the relevant operation.
  • the Flr 203 performs a rollback operation according to the log record, that is, the modified data of the log record is retracted from the current time point of the log record for a set time length,
  • the modified data of the set time length corresponds to all the modified records of the Fas 202, that is, the data modification changes reported by the Fac.
  • a rollback request is sent to the Flr 203 to roll back the corresponding data; the Flr 203 rolls back the corresponding data to the buffer of the corresponding Fas 202 according to the rollback request, and completes the log file system. repair.
  • the processing flow of the Flr 203 in this embodiment can be as shown in FIG. 5.
  • the log system When one of the Fas 202s is abnormally restarted, the log system enters the repair process. The process is first triggered on Flr 203. When Flr 203 confirms that one Fas 202 is restarted, the log system will roll back all the modification records corresponding to this Fas 202 for a certain length of time through the log record on Flr 203. At the same time, when the Fas 202 is powered on, the logs recorded locally by the Fas 202 are used to roll back related data written to the Fas 202 but not written to the disk, and a rollback request is sent to the Flr 203 to roll back the corresponding data.
  • the system can provide all the characteristics of the lagging log file system without reducing the response of the file system, ensuring high consistency of files after system reset and restart.
  • the Fac 201 acquires the file data and pushes it to the Fas 202; the Fas 202 records the file data pushed by the Fac 201, and records the corresponding metadata modification on the Fas 202 in the buffer.
  • the Fac 201 sends a metadata modification change request to the Flr 203;
  • the Flr 203 is Describe the metadata modification request, Modify the corresponding metadata and record it to the log file system; when it is detected that the Fas 202 is abnormally restarted, the Flr 203 performs a rollback operation of the corresponding modified data according to the log record, completes the repair of the log file system, and ensures the distributed
  • the file system resets and restarts the final high consistency of the file, avoiding data inconsistency between multiple copies caused by machine downtime, and minimizing the corresponding delay and performance loss due to the addition of the log system. .
  • the log system has no sensitivity and correlation to the scale of the distributed system, and the system pressure is constant, and the pressure of the log system is not increased due to the expansion of the cluster. Has good convergence and no overhead on the network.
  • the pressure on the disk where the log system resides is extremely small, which is a high-performance, low-latency log file system at the expense of high error rate.
  • the disclosed device can be implemented in other manners.
  • the device embodiments described above are merely illustrative, and may be further divided in actual implementation, such as some features may be omitted.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the invention discloses a data processing method and a distributed file system of a distributed file system, which can obtain file data through Fac and push it to Fas; the Fas record is pushed by the Fac.
  • File data in the buffer record, the corresponding metadata modification on the Fas, write the log file, and return the file data push completion message to the Fac;
  • the Fac receives the file data push completion message returned by the Fas And sending a metadata modification change request to the Flr;
  • the Flr modifies the change request according to the metadata, and modifies the corresponding metadata, and records the data to the log file system; when the abnormal restart of the Fas is detected, the Flr performs the corresponding according to the log record.
  • Modify the data rollback operation to complete the repair of the log file system. It can guarantee the final high consistency of files after reset and restart of distributed file system, avoiding data inconsistency between multiple copies caused by machine downtime, and minimizing the delay caused by the addition of log system. Loss in performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种分布式文件系统的数据处理方法及分布式文件系统,其方法包括:Fac获取文件数据,推送给Fas;Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向Fac返回文件数据推送完成消息;Fac向Flr发送元数据修改变化请求;Flr根据元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到Fas异常重启时,Flr根据日志记录,进行相应修改数据的回滚操作。

Description

分布式文件系统的数据处理方法及分布式文件系统 技术领域
本发明涉及分布式文件存储技术领域,尤其涉及一种分布式文件系统的数据处理方法及分布式文件系统。
背景技术
随着多媒体产业的迅猛发展,出于成本、可靠性等多方面的考虑,越来越多的厂商选择在产品中部署自研的分布式上层存储系统,分布式文件系统也因此得到了快速的发展。分布式文件系统可以提供高的吞吐率,可以提供普通本地文件系统几倍以上的吞吐率,同时可以提供高可靠性,通过多副本、冗余副本技术,提高单机异常时数据的可靠性,同时对于磁阵这样的设备,具有价格便宜、设备通用的优点。
目前,在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。而对于分布式系统中的大量机器,宕机重启已经是一个常态的问题,如何保证在服务器宕机重启后,保证文件多个副本内数据的一致性,将十分的必要。
发明内容
本发明实施例的主要目的在于提供一种分布式文件系统的数据处理方法及分布式文件系统,避免Fas宕机重启所带来的多副本间数据的不一致性。
为了达到上述目的,本发明实施例提出一种分布式文件系统的数据处理方法,包括:
Fac获取文件数据,推送给Fas;
所述Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对 应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;
所述Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;
所述Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;
当监测到所述Fas异常重启时,所述Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
优选地,所述Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统的步骤中还包括:
所述Flr按照时间的顺序,将相关处理的条目加入对应的Fas的缓冲区。
优选地,所述当监测到Fas异常重启时,所述Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复的步骤包括:
当监测到所述Fas异常重启时,所述Flr根据日志记录,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述Fas的所有修改记录;
当所述Fas上电时,发送回滚请求到Flr以回滚相应的数据;
所述Flr根据所述回滚请求回滚相应的数据至对应的Fas的缓冲区,完成日志文件系统的修复。
优选地,所述Flr监测Fas异常的步骤包括:
所述Flr接收所述Fas定期发送的心跳报文;
当监测到连续若干次丢失心跳报文时,判定所述Fas异常。
优选地,所述Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求的步骤包括:
所述Fac接收到所述Fas返回的文件数据推送完成消息后,将对应的元 数据修改变化请求填入修改待通知缓冲区;
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr。
本发明实施例还提出一种分布式文件系统,包括:Fac、Fas及Flr,其中:
所述Fac,用于获取文件数据,推送给Fas;
所述Fas,用于记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;
所述Fac,还用于接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;
所述Flr,用于根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;
所述Flr,还用于当监测到所述Fas异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
优选地,所述Flr,还用于按照时间的顺序,将相关处理的条目加入对应的Fas的缓冲区。
优选地,所述Flr,还用于当监测到所述Fas异常重启时,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述Fas的所有修改记录;
所述Fas,还用于当所述Fas上电时,发送回滚请求到Flr以回滚相应的数据;
所述Flr,还用于根据所述回滚请求回滚相应的数据至对应的Fas的缓冲区,完成日志文件系统的修复。
优选地,所述Flr,还用于接收所述Fas定期发送的心跳报文;当监测 到连续若干次丢失心跳报文时,判定所述Fas异常。
优选地,所述Fac,还用于接收到所述Fas返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr。
本发明实施例提出的一种分布式文件系统的数据处理方法及分布式文件系统,Fac获取文件数据,推送给Fas;Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述Fas异常重启时,Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。
附图说明
图1是本发明分布式文件系统的数据处理方法一实施例的流程示意图;
图2是本发明实施例Fac、Fas及Flr之间的交互流程示意图;
图3是本发明实施例Fac与Fas之间交互以及Fas刷写时序示意图;
图4是本发明实施例Fac向Flr发送元数据修改变化请求的具体处理流程示意图;
图5是本发明实施例Flr的处理流程示意图;
图6是本发明分布式文件系统一实施例架构示意图。
为了使本发明的技术方案更加清楚、明了,下面将结合附图作进一步 详述。
具体实施方式
本发明实施例的解决方案主要是:Fac获取文件数据,推送给Fas;Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述Fas异常重启时,Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。
如图2所示,本发明一实施例提出一种分布式文件系统的数据处理方法,包括:
步骤S101,Fac获取文件数据,推送给Fas;
本发明方法实施例涉及的系统运行环境包括:Fac、Fas及Flr,其中:
Fac:文件服务客户端,用于提供用户与分布式文件系统内部数据的衔接。
Fas:文件数据服务器,用于存放文件实际的数据。
Flr:文件位置寄存器,用于存放文件与数据对应的元数据等的相关信息。
由于目前在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证,并没有提供类似于本地文件系统日志文件系统的保障。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。现有方案在服务器宕机重启后,无法保证文件多个 副本内数据的一致性。
本实施例方案提出一种针对双层元数据情况下的,滞后形的日志文件系统方式,可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。
关于日志文件的作用:以本地文件系统为例,ext2文件系统是一个通用的文件系统,本身不带有日志文件系统的功能,在复位、断电过程中很可能会丢失正在写或修改的一些数据,而造成元数据与数据的不一致性。而针对这一问题,ext3文件系统进行了改进,添加了日志系统的功能,在上电的时候通过对日志部分的重放,修正文件系统的一致性。
具体地,本实施例所涉及的双层元数据是指:在Flr和Fas上都有对应元数据的成分,Flr上对应的是文件分片数据位置名称信息,Fas上存放着分片名称与实际磁盘块的对应信息。通俗的讲,构建在本地文件系统之上的含有管理元数据的分布式文件系统,都属于这种双层元数据分布式文件系统范畴。
本实施例方案中,Fac的作用为发送相关元数据修改变化请求,本身可以借助原有分布式文件系统的相关功能。
Fas本身是一个构建于双层元数据类下层元数据上的功能,通过这个部分,保证在Fas上,可以构建一个有效的元数据修改记录的日志部分,保证Fas侧的一致性。
Flr构建在双层元数据的上层元数据上,主要保证关于上层元数据层修改之后的日志重放回滚问题。
系统中Fac、Fas及Flr之间的交互流程可以如图2所示。
更为具体地,首先,Fac获取文件数据,推送给Fas,用于存储数据。
步骤S102,所述Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数 据推送完成消息;
Fas记录Fac推送过来的文件数据,同时在缓冲区里记录下此次Fas上元数据的修改,并向所述Fac返回文件数据推送完成消息。
此外,Fas定期的先于数据将修改的缓冲区刷写入正常的日志文件中。
Fas刷写数据到磁盘之后,将刷写成功的元数据修改完成放入缓冲区,定期刷写入日志文件中。
其中,Fac与Fas之间交互以及Fas刷写时序可以如图3所示。
以Fac发送数据a和数据b到Fas为例,具体处理流程如下:
1、Fac发送数据a到Fas。
2、Fas将修改数据a的通知插入修改缓冲区。
3、Fas将数据a写入数据缓冲区。
4、Fas返回给Fac,通知Fac,a已经写数据成功。(此时之后就开启了向Flr发送元数据修改通知)
5、Fac发送数据b到Fas。
6、Fas将修改数据b的通知插入修改缓冲区。
7、Fas将数据b写入数据缓冲区。
8、Fas返回给Fac,通知Fac,b已经写数据成功。(步骤5~8代表不同的数据,这里体现出异步通知的速度)
9、定时日志任务刷写,a和b的修改通知被写入磁盘。
10、a的数据被写入磁盘。
11、a数据写入磁盘的完成通知插入修改缓冲区。
12、b的数据被写入磁盘。
13、b数据写入磁盘的完成通知插入修改缓冲区。
14、定时日志任务刷写,a和b的写入磁盘完成通知,被写入磁盘。
此时完整的日志流程被写入,此时Fas侧日志系统被完整写入。
步骤S103,所述Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;
Fac接收到Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求,在元数据修改变化请求中附带上日志文件系统的相关数据。
作为一种优选实施方式,Fac在向Flr发送元数据修改变化请求时,具体可以采用如下方案:
Fac接收到所述Fas返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区。
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr。
以Fac向Flr发送数据a的元数据修改变化请求、数据b的元数据修改变化请求、数据c的元数据修改变化请求、数据d的元数据修改变化请求为例,Fac向Flr发送元数据修改变化请求的具体处理流程可以如图4所示。
1、Fac写x文件后将a的修改填入修改待通知缓冲区;
2、Fac写x文件后将b的修改填入修改待通知缓冲区;
3、Fac写x文件后将c的修改填入修改待通知缓冲区;
4、Fac写y文件后将d的修改填入修改待通知缓冲区。
此时是检测时间已经达到要求的时间区间,同时定时器还没有触发,则触发发送元数据同步消息给Flr,同时重新设置定时器。
当一段时间后,定时器触发,将待通知缓冲区内的消息,通知到Flr并重行设置定时器。此种处理方式,可以大大的减轻对于Flr主控消息的数量,同时在短小的时间间隔内又可以尽可能的保持实时性。
步骤S104,所述Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;
Flr收到元数据修改变化后修改相应元数据,并通过附加日志相关数据 记录,将相关元数据修改到日志系统中。与此同时,Fas刷写数据入磁盘,在确定写入成功后刷写日志。
另外,Flr按照时间的顺序,将相关处理的条目加入对应的Fas的缓冲区。
步骤S105,当监测到所述Fas异常重启时,所述Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
Flr通过接收Fas定期发送的心跳报文来监测Fas是否异常。
Fas定期发送still alive消息,以表明Fas依然在工作。
当监测到来自Fas的心跳报文时,判定所述Fas正常,当监测到连续若干次丢失心跳报文时,判定所述Fas异常。
对于Fas发送来的心跳报文,Flr不做处理,但是如果出现某个连续丢失心跳报文的情况,Flr就需要对丢失心跳报文的Fas做滞后处理,保证如果是真实的Fas宕机复位,将做相关操作的回滚动作。
具体地,当监测到所述Fas异常重启时,所述Flr根据日志记录,进行回滚操作,即从当前时间点,将日志记录的修改数据向前回退特定时间长度,该特定时间长度的修改数据对应于所述Fas的所有修改记录,即Fac上报的数据修改变化。
当所述Fas上电时,发送回滚请求到Flr以回滚相应的数据;Flr根据所述回滚请求回滚相应的数据至对应的Fas的缓冲区,完成日志文件系统的修复。
本实施例中Flr的处理流程可以如图5所示。
当其中的一台Fas异常宕机重启的情况下,日志系统进入修复流程。流程首先于Flr上触发,当Flr确认一台Fas重启了,日志系统将通过Flr上的日志记录回滚特定时间长度对应于这台Fas的所有修改记录。同时当这台Fas上电时,通过Fas本地记录的日志,回滚那些写入Fas但是没有写入磁 盘的相关数据,发送回滚请求到Flr以回滚相应的数据。
当两个流程运行完成,修复流程顺利完成,同时系统在修复流程中,通过其它副本的存在依然提供一致性的数据,达到对用户的不可见。
本系统可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。
相比现有技术,本施例方案中,Fac获取文件数据,推送给Fas;Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述Fas异常重启时,Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。
本日志系统对于分布式系统的规模没有敏感性与相关性,对系统压力是常量,不会因为集群的扩大而增大日志系统的压力。具有良好的收敛性,同时没有网络上的额外开销。对于日志系统所在磁盘压力极小,是一种以较高错杀率为代价的高性能,低延迟的日志文件系统。
如图6所示,本发明一实施例提出一种分布式文件系统,包括:Fac 201、Fas 202及Flr 203,其中:
所述Fac 201,用于获取文件数据,推送给Fas 202;
所述Fas 202,用于记录Fac 201推送过来的文件数据,在缓冲区记录下此次Fas 202上对应的元数据的修改,写入日志文件,并向所述Fac 201 返回文件数据推送完成消息;
所述Fac 201,还用于接收到所述Fas 202返回的文件数据推送完成消息后,向Flr 203发送元数据修改变化请求;
所述Flr 203,用于根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;
所述Flr 203,还用于当监测到所述Fas 202异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
具体地,Fac 201:文件服务客户端,用于提供用户与分布式文件系统内部数据的衔接。
Fas 202:文件数据服务器,用于存放文件实际的数据。
Flr 203:文件位置寄存器,用于存放文件与数据对应的元数据等的相关信息。
由于目前在大部分的分布式文件系统中,一部分注重吞吐量性能,但是却降低了文件系统一致性的保证,并没有提供类似于本地文件系统日志文件系统的保障。而另一部分在保证了同步的一致性的情况下,却大大降低了写和修改的性能。现有方案在服务器宕机重启后,无法保证文件多个副本内数据的一致性。
本实施例方案提出一种针对双层元数据情况下的,滞后形的日志文件系统方式,可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。
关于日志文件的作用:以本地文件系统为例,ext2文件系统是一个通用的文件系统,本身不带有日志文件系统的功能,在复位、断电过程中很可能会丢失正在写或修改的一些数据,而造成元数据与数据的不一致性。而针对这一问题,ext3文件系统进行了改进,添加了日志系统的功能,在上电的时候通过对日志部分的重放,修正文件系统的一致性。
具体地,本实施例所涉及的双层元数据是指:在Flr 203和Fas 202上都有对应元数据的成分,Flr 203上对应的是文件分片数据位置名称信息,Fas 202上存放着分片名称与实际磁盘块的对应信息。通俗的讲,构建在本地文件系统之上的含有管理元数据的分布式文件系统,都属于这种双层元数据分布式文件系统范畴。
本实施例方案中,Fac 201的作用为发送相关元数据修改变化请求,本身可以借助原有分布式文件系统的相关功能。
Fas 202本身是一个构建于双层元数据类下层元数据上的功能,通过这个部分,保证在Fas 202上,可以构建一个有效的元数据修改记录的日志部分,保证Fas 202侧的一致性。
Flr 203构建在双层元数据的上层元数据上,主要保证关于上层元数据层修改之后的日志重放回滚问题。
系统中Fac 201、Fas 202及Flr 203之间的交互流程可以如图2所示。
更为具体地,首先,Fac 201获取文件数据,推送给Fas 202,用于存储数据。
Fas 202记录Fac 201推送过来的文件数据,同时在缓冲区里记录下此次Fas 202上元数据的修改,并向所述Fac 201返回文件数据推送完成消息。
此外,Fas 202定期的先于数据将修改的缓冲区刷写入正常的日志文件中。
Fas 202刷写数据到磁盘之后,将刷写成功的元数据修改完成放入缓冲区,定期刷写入日志文件中。
其中,Fac 201与Fas 202之间交互以及Fas 202刷写时序可以如图3所示。
以Fac 201发送数据a和数据b到Fas 202为例,具体处理流程如下:
1、Fac 201发送数据a到Fas 202。
2、Fas 202将修改数据a的通知插入修改缓冲区。
3、Fas 202将数据a写入数据缓冲区。
4、Fas 202返回给Fac 201,通知Fac 201,a已经写数据成功。(此时之后就开启了向Flr 203发送元数据修改通知)
5、Fac 201发送数据b到Fas 202。
6、Fas 202将修改数据b的通知插入修改缓冲区。
7、Fas 202将数据b写入数据缓冲区。
8、Fas 202返回给Fac 201,通知Fac 201,b已经写数据成功。(步骤5~8代表不同的数据,这里体现出异步通知的速度)
9、定时日志任务刷写,a和b的修改通知被写入磁盘。
10、a的数据被写入磁盘。
11、a数据写入磁盘的完成通知插入修改缓冲区。
12、b的数据被写入磁盘。
13、b数据写入磁盘的完成通知插入修改缓冲区。
14、定时日志任务刷写,a和b的写入磁盘完成通知,被写入磁盘。
此时完整的日志流程被写入,此时Fas 202侧日志系统被完整写入。
Fac 201接收到Fas 202返回的文件数据推送完成消息后,向Flr 203发送元数据修改变化请求,在元数据修改变化请求中附带上日志文件系统的相关数据。
作为一种优选实施方式,Fac 201在向Flr 203发送元数据修改变化请求时,具体可以采用如下方案:
Fac 201接收到所述Fas 202返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区。
当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr 203。
以Fac 201向Flr 203发送数据a的元数据修改变化请求、数据b的元数据修改变化请求、数据c的元数据修改变化请求、数据d的元数据修改变化请求为例,Fac 201向Flr 203发送元数据修改变化请求的具体处理流程可以如图4所示。
1、Fac 201写x文件后将a的修改填入修改待通知缓冲区;
2、Fac 201写x文件后将b的修改填入修改待通知缓冲区;
3、Fac 201写x文件后将c的修改填入修改待通知缓冲区;
4、Fac 201写y文件后将d的修改填入修改待通知缓冲区。
此时是检测时间已经达到要求的时间区间,同时定时器还没有触发,则触发发送元数据同步消息给Flr 203,同时重新设置定时器。
当一段时间后,定时器触发,将待通知缓冲区内的消息,通知到Flr 203并重行设置定时器。此种处理方式,可以大大的减轻对于Flr 203主控消息的数量,同时在短小的时间间隔内又可以尽可能的保持实时性。
Flr 203收到元数据修改变化后修改相应元数据,并通过附加日志相关数据记录,将相关元数据修改到日志系统中。与此同时,Fas 202刷写数据入磁盘,在确定写入成功后刷写日志。
另外,Flr 203按照时间的顺序,将相关处理的条目加入对应的Fas 202的缓冲区。
当监测到所述Fas 202异常重启时,所述Flr 203根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
Flr 203通过接收Fas 202定期发送的心跳报文来监测Fas 202是否异常。
Fas 202定期发送still alive消息,以表明Fas 202依然在工作。
当监测到来自Fas 202的心跳报文时,判定所述Fas 202正常,当监测到连续若干次丢失心跳报文时,判定所述Fas 202异常。
对于Fas 202发送来的心跳报文,Flr 203不做处理,但是如果出现某个 连续丢失心跳报文的情况,Flr 203就需要对丢失心跳报文的Fas 202做滞后处理,保证如果是真实的Fas 202宕机复位,将做相关操作的回滚动作。
具体地,当监测到所述Fas 202异常重启时,所述Flr 203根据日志记录,进行回滚操作,即将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述Fas 202的所有修改记录,即Fac上报的数据修改变化。
当所述Fas 202上电时,发送回滚请求到Flr 203以回滚相应的数据;Flr 203根据所述回滚请求回滚相应的数据至对应的Fas 202的缓冲区,完成日志文件系统的修复。
本实施例中Flr 203的处理流程可以如图5所示。
当其中的一台Fas 202异常宕机重启的情况下,日志系统进入修复流程。流程首先于Flr 203上触发,当Flr 203确认一台Fas 202重启了,日志系统将通过Flr 203上的日志记录回滚特定时间长度对应于这台Fas 202的所有修改记录。同时当这台Fas 202上电时,通过Fas 202本地记录的日志,回滚那些写入Fas 202但是没有写入磁盘的相关数据,发送回滚请求到Flr 203以回滚相应的数据。
当两个流程运行完成,修复流程顺利完成,同时系统在修复流程中,通过其它副本的存在依然提供一致性的数据,达到对用户的不可见。
本系统可以在不降低文件系统响应的前提下,提供滞后的日志文件系统的所有特性,保证系统复位重启后文件的高一致性。
相比现有技术,本施例方案中,Fac 201获取文件数据,推送给Fas 202;Fas 202记录Fac 201推送过来的文件数据,在缓冲区记录下此次Fas 202上对应的元数据的修改,写入日志文件,并向所述Fac 201返回文件数据推送完成消息;Fac 201接收到所述Fas 202返回的文件数据推送完成消息后,向Flr 203发送元数据修改变化请求;Flr 203根据所述元数据修改变化请求, 修改相应的元数据,并记录至日志文件系统;当监测到所述Fas 202异常重启时,Flr 203根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,保证了分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。
本发明实施例中日志系统对于分布式系统的规模没有敏感性与相关性,对系统压力是常量,不会因为集群的扩大而增大日志系统的压力。具有良好的收敛性,同时没有网络上的额外开销。对于日志系统所在磁盘压力极小,是一种以较高错杀率为代价的高性能,低延迟的日志文件系统。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,实际实现时可以有另外的划分方式,如一些特征可以忽略。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
工业实用性
本发明公开了一种分布式文件系统的数据处理方法及分布式文件系统,能够通过Fac获取文件数据,推送给Fas;由Fas记录Fac推送过来的 文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;当监测到所述Fas异常重启时,Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。能够保证分布式文件系统复位重启后文件的最终高一致性,避免机器宕机重启所带来的多副本间数据的不一致性,同时最大程度的减少由于日志系统的添加而带来相应的延迟和性能上的损失。

Claims (10)

  1. 一种分布式文件系统的数据处理方法,包括:
    文件服务客户端Fac获取文件数据,推送给文件数据服务器Fas;
    所述Fas记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;
    所述Fac接收到所述Fas返回的文件数据推送完成消息后,向文件位置寄存器Flr发送元数据修改变化请求;
    所述Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;
    当监测到所述Fas异常重启时,所述Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
  2. 根据权利要求1所述的方法,其中,所述Flr根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统,包括:
    所述Flr按照时间的顺序,将相关处理的条目加入对应的Fas的缓冲区。
  3. 根据权利要求1所述的方法,其中,所述当监测到Fas异常重启时,所述Flr根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复,包括:
    当监测到所述Fas异常重启时,所述Flr根据日志记录,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设定时间长度的修改数据对应于所述Fas的所有修改记录;
    当所述Fas上电时,发送回滚请求到Flr以回滚相应的数据;
    所述Flr根据所述回滚请求回滚相应的数据至对应的Fas的缓冲区,完成日志文件系统的修复。
  4. 根据权利要求1-3任一项所述的方法,其中,所述Flr监测Fas异常, 包括:
    所述Flr接收所述Fas定期发送的心跳报文;
    当监测到连续若干次丢失心跳报文时,判定所述Fas异常。
  5. 根据权利要求4所述的方法,其中,所述Fac接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求,包括:
    所述Fac接收到所述Fas返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;
    当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr。
  6. 一种分布式文件系统,包括:文件服务客户端Fac、文件数据服务器Fas及文件位置寄存器Flr;其中,
    所述Fac,配置为获取文件数据,推送给Fas;以及接收到所述Fas返回的文件数据推送完成消息后,向Flr发送元数据修改变化请求;
    所述Fas,配置为记录Fac推送过来的文件数据,在缓冲区记录下此次Fas上对应的元数据的修改,写入日志文件,并向所述Fac返回文件数据推送完成消息;
    所述Flr,配置为根据所述元数据修改变化请求,修改相应的元数据,并记录至日志文件系统;以及当监测到所述Fas异常重启时,根据日志记录,进行相应修改数据的回滚操作,完成日志文件系统的修复。
  7. 根据权利要求6所述的系统,其中,
    所述Flr,配置为按照时间的顺序,将相关处理的条目加入对应的Fas的缓冲区。
  8. 根据权利要求6所述的系统,其中,
    所述Flr,配置为当监测到所述Fas异常重启时,根据日志记录,将日志记录的修改数据,从日志记录的当前时间点回退设定时间长度,所述设 定时间长度的修改数据对应于所述Fas的所有修改记录;根据所述回滚请求回滚相应的数据至对应的Fas的缓冲区,完成日志文件系统的修复;
    所述Fas,配置为当所述Fas上电时,发送回滚请求到Flr以回滚相应的数据。
  9. 根据权利要求6-8任一项所述的系统,其中,
    所述Flr,配置为接收所述Fas定期发送的心跳报文;当监测到连续若干次丢失心跳报文时,判定所述Fas异常。
  10. 根据权利要求9所述的系统,其中,
    所述Fac,配置为接收到所述Fas返回的文件数据推送完成消息后,将对应的元数据修改变化请求填入修改待通知缓冲区;当设定的定时时间到达时,将修改待通知缓冲区内的所有元数据修改变化请求发送至Flr。
PCT/CN2015/072772 2014-10-24 2015-02-11 分布式文件系统的数据处理方法及分布式文件系统 WO2016061956A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410578968.2A CN105589887B (zh) 2014-10-24 2014-10-24 分布式文件系统的数据处理方法及分布式文件系统
CN201410578968.2 2014-10-24

Publications (1)

Publication Number Publication Date
WO2016061956A1 true WO2016061956A1 (zh) 2016-04-28

Family

ID=54766145

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2015/072772 WO2016061956A1 (zh) 2014-10-24 2015-02-11 分布式文件系统的数据处理方法及分布式文件系统
PCT/CN2015/076473 WO2015184925A1 (zh) 2014-10-24 2015-04-13 分布式文件系统的数据处理方法及分布式文件系统

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/076473 WO2015184925A1 (zh) 2014-10-24 2015-04-13 分布式文件系统的数据处理方法及分布式文件系统

Country Status (2)

Country Link
CN (1) CN105589887B (zh)
WO (2) WO2016061956A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021562B (zh) * 2016-10-31 2022-11-18 中兴通讯股份有限公司 应用于分布式文件系统的存盘方法、装置及分布式文件系统
CN106599046B (zh) * 2016-11-09 2020-06-30 北京同有飞骥科技股份有限公司 分布式文件系统的写入方法及装置
CN109284066B (zh) * 2017-07-19 2022-09-30 阿里巴巴集团控股有限公司 一种数据处理方法、装置、设备及系统
CN109117093B (zh) * 2018-08-20 2021-10-01 赛凡信息科技(厦门)有限公司 保证分布式对象存储中的数据、流量、容量一致性的方法
CN111522688B (zh) * 2019-02-01 2023-09-15 阿里巴巴集团控股有限公司 分布式系统的数据备份方法及装置
CN110096358A (zh) * 2019-04-11 2019-08-06 上海交通大学 动力装备远程中心分布式存储与分布式计算方法
CN111143126A (zh) * 2019-12-20 2020-05-12 浪潮电子信息产业股份有限公司 一种分布式文件系统的数据拷贝方法、系统及相关组件
CN114504828B (zh) * 2022-02-08 2023-04-28 北京趣玩天橙科技有限公司 一种数据回滚实现内存一致性的方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916215A (zh) * 2010-08-09 2010-12-15 哈尔滨工程大学 一种基于操作截取的分布式关键任务系统悔改方法
CN102662795A (zh) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 一种分布式存储系统中元数据容错恢复方法
CN102890716A (zh) * 2012-09-29 2013-01-23 南京中兴新软件有限责任公司 分布式文件系统和分布式文件系统的数据备份方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7681072B1 (en) * 2004-08-13 2010-03-16 Panasas, Inc. Systems and methods for facilitating file reconstruction and restoration in data storage systems where a RAID-X format is implemented at a file level within a plurality of storage devices
US8762642B2 (en) * 2009-01-30 2014-06-24 Twinstrata Inc System and method for secure and reliable multi-cloud data replication
CN102024016B (zh) * 2010-11-04 2013-03-13 曙光信息产业股份有限公司 一种分布式文件系统快速数据恢复的方法
CN102833273B (zh) * 2011-06-13 2017-11-03 中兴通讯股份有限公司 临时故障时的数据修复方法及分布式缓存系统
CN102368267A (zh) * 2011-10-25 2012-03-07 曙光信息产业(北京)有限公司 一种维护分布式系统中副本一致性的方法
KR101694288B1 (ko) * 2012-06-08 2017-01-09 한국전자통신연구원 비대칭형 클러스터 파일 시스템의 데이터 관리 방법
CN103051681B (zh) * 2012-12-06 2015-06-17 华中科技大学 一种面向分布式文件系统的协作式日志系统
CN103077222B (zh) * 2012-12-31 2016-01-27 中国科学院计算技术研究所 机群文件系统分布式元数据一致性保证方法及系统
CN103198159B (zh) * 2013-04-27 2016-01-06 国家计算机网络与信息安全管理中心 一种基于事务重做的异构集群多副本一致性维护方法
CN103297268B (zh) * 2013-05-13 2016-04-06 北京邮电大学 基于p2p技术的分布式数据一致性维护系统和方法
CN103294787A (zh) * 2013-05-21 2013-09-11 成都市欧冠信息技术有限责任公司 分布式数据库系统的多副本存储方法和系统
CN103279568A (zh) * 2013-06-18 2013-09-04 无锡紫光存储系统有限公司 一种元数据管理系统及方法
CN103729436A (zh) * 2013-12-27 2014-04-16 中国科学院信息工程研究所 一种分布式元数据管理方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916215A (zh) * 2010-08-09 2010-12-15 哈尔滨工程大学 一种基于操作截取的分布式关键任务系统悔改方法
CN102662795A (zh) * 2012-03-20 2012-09-12 浪潮电子信息产业股份有限公司 一种分布式存储系统中元数据容错恢复方法
CN102890716A (zh) * 2012-09-29 2013-01-23 南京中兴新软件有限责任公司 分布式文件系统和分布式文件系统的数据备份方法

Also Published As

Publication number Publication date
CN105589887B (zh) 2020-04-03
CN105589887A (zh) 2016-05-18
WO2015184925A1 (zh) 2015-12-10

Similar Documents

Publication Publication Date Title
WO2016061956A1 (zh) 分布式文件系统的数据处理方法及分布式文件系统
WO2019154394A1 (zh) 分布式数据库集群系统、数据同步方法及存储介质
US9152501B2 (en) Write performance in fault-tolerant clustered storage systems
US9875042B1 (en) Asynchronous replication
US9235481B1 (en) Continuous data replication
US9557925B1 (en) Thin replication
KR101662212B1 (ko) 부분동기화 지원 데이터베이스 관리 시스템 및 데이터베이스 관리 시스템에서 부분동기화 방법
US10223007B1 (en) Predicting IO
US10565071B2 (en) Smart data replication recoverer
US10831741B2 (en) Log-shipping data replication with early log record fetching
US20170185323A1 (en) Low overhead resynchronization snapshot creation and utilization
WO2018098972A1 (zh) 一种日志恢复方法、存储装置和存储节点
US10365978B1 (en) Synchronization of snapshots in a distributed consistency group
CN103077242A (zh) 一种实现数据库服务器双机热备的方法
CN105824846B (zh) 数据迁移方法及装置
JP6133396B2 (ja) 計算機システム、サーバ、及び、データ管理方法
WO2019020081A1 (zh) 分布式系统及其故障恢复方法、装置、产品和存储介质
WO2017041616A1 (zh) 数据读写方法及装置、双活存储系统及其实现方法
US8527454B2 (en) Data replication using a shared resource
CN110825562B (zh) 数据备份方法、装置、系统和存储介质
US9053073B1 (en) Use of timestamp logic in synchronous replication
WO2017014814A1 (en) Replicating memory volumes
CN110413565A (zh) 一种多台存储设备同步快照的方法、设备及介质
EP3896571B1 (en) Data backup method, apparatus and system
JP2009080705A (ja) 仮想計算機システム及び同システムにおける仮想計算機復元方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15853443

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15853443

Country of ref document: EP

Kind code of ref document: A1