WO2012075845A1

WO2012075845A1 - Distributed file system

Info

Publication number: WO2012075845A1
Application number: PCT/CN2011/079685
Authority: WO
Inventors: 张辉; 范家星; 姜南; 吴波
Original assignee: 华为技术有限公司
Priority date: 2010-12-08
Filing date: 2011-09-15
Publication date: 2012-06-14
Also published as: CN102024044B; CN102024044A

Abstract

Provided is a distributed file system, including an access module (11) for the distributed file system, a metadata management unit (12), a primary storage system (13), a standby storage system (14), and an external storage system (15). Among them, the access module (11), the metadata management unit (12), the primary storage system (13) and the standby storage system (14) are connected to each other via a system bus; and the external storage system (15) is connected to the primary storage system (13) and the standby storage system (14) via a network. In the distributed file system provided by the present invention, by writing data into the primary storage system (13) and backing up the same into the standby storage system (14) using an asynchronous backup mechanism, the read-write performance of high speed equipment will not be affected, online data recovery can be achieved, and data recovery can be automatically realized during data external service, accelerating the recovery process.

Description

Distributed file system

Embodiments of the present invention relate to data backup technologies, and in particular, to a distributed file system. Background technique

With the development of the Internet in a broader and broader direction, there are more and more applications in various industries, especially streaming media applications. The content delivery server performance and reliability in the Content Delivery Network (CDN) are coming. The more important, the core file system hosting these application data is increasingly demanding performance and reliability.

For reliability, the Redundant Array of Independent Disk (RAID) technology is mainly guaranteed by the redundancy backup feature. Simply put, RAID is a combination of multiple independent hard disks (physical hard disks) in different ways to form a disk group (logical hard disk), which provides higher storage performance and data backup technology than a single hard disk. For performance, the prior art generally uses the storage striping management of the distributed file system to superimpose the aggregated hard disk bandwidth. The striping is a management function, and the function is to spread data on multiple storage devices in a certain step size, so that the data is obtained from multiple physical storage devices in parallel during reading, and the performance of multiple physical storage devices is superimposed. . The data redundancy and data striping features of RAID technology ensure high reliability and high performance of distributed file systems.

In general, a distributed file system can be understood to be built on a network storage system. In the form of increasing demand for high performance, the trend of hard disk being replaced by solid state disk is growing. Because the solid state disk is extremely expensive, the RAID technology of solid state disk backup SSD is causing a sharp increase in server cost. RAID technology is often implemented in RAID1 and RAID5. Among them, RAID1 is a level 1 RAID technology. It uses a full-mirror backup. It requires two homogeneous storage systems to perform read and write operations simultaneously and mirror each other. Even if one disk is damaged, the system can work normally. RAID5 is a storage solution that combines storage performance, data security, and storage costs. According to the backup, the data and the corresponding parity information are stored on the respective disks constituting the RAID 5, and the parity information and the corresponding data are respectively stored on different disks. When a disk data of RAID 5 is damaged, the remaining data and corresponding parity information are used to recover the corrupted data.

In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art:

In the CDN network, the use of RAID technology has the following disadvantages: The same capacity, the RAID needs more disks, especially RAID1 requires double the disk, and the SSD is extremely expensive, so the storage cost leads to the system cost is too high; RAID5 any data Modifications need to be rewritten to verify that the write data is slightly slower and the data recovery time is long, which may affect the business; moreover, the recovery of the data backup is limited by the number of damaged disks. Summary of the invention

An embodiment of the present invention provides a distributed file system, including: an access module, a metadata management unit connected to the access module, and an active storage system and an alternate storage system respectively connected to the metadata management unit. The access module is further connected to the primary storage system and the backup storage system respectively; between the access module, the metadata management unit, the primary storage system, and the backup storage system And the external storage system is connected to the primary storage system and the backup storage system by using a network; wherein: the access module is configured to receive a read/write data request, Transmitting, to the metadata management unit, a metadata request to acquire metadata corresponding to the requested data, and applying the metadata to read and write data to the primary storage system or the standby storage system;

The metadata management unit is configured to: when the access module requests the metadata, find a location of the requested data on the primary storage system or the backup storage system, and construct the metadata Returning to the access module, the primary storage system, configured to provide the requested data to the access module when the distributed file system is in a normal state; and the standby storage system is configured to When the distributed file system is in an abnormal state or a restored state, the primary storage system is Providing data backup; the external storage system is configured to provide data backup for the primary storage system. In the distributed file system provided by the embodiment of the present invention, by writing data to the primary storage system, the asynchronous backup mechanism is used to backup to the backup storage system, which does not affect the read and write performance of the high-speed device; The data is automatically restored during the external service process, and the recovery process is accelerated. DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a schematic diagram showing the composition of a distributed file system according to an embodiment of the present invention;

2 is a schematic diagram of a composition of a distributed file system according to another embodiment of the present invention;

3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention; FIG. 4 is a schematic flowchart of a process of a distributed file system in an abnormal state and a read data request according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention; FIG.

6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention;

7 is a schematic diagram of a process flow of a distributed file system in response to a write data request in a recovery state according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a processing flow of a data recovery process in a system recovery state of a distributed file system according to an embodiment of the present invention. detailed description The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

1 is a schematic diagram of a distributed file system according to an embodiment of the present invention. As shown in FIG. 1, the distributed file system 1 includes an access module 11, a metadata management unit 12 connected to the access module 11, and a separate element. The primary storage system 13 and the backup storage system 14 are connected to the data management unit 12, and the access module 11 is also connected to the primary storage system 13 and the backup storage system 14, respectively; wherein, the access module 11, the metadata management unit 12, and the main The storage system 13 and the backup storage system 14 are both located in the internal network, and the functional modules are connected by a system bus; the distributed file system 1 further includes an external storage system 15 located in the external network, and the external storage system 15 is used by the primary storage system 15 The storage system 13 and the alternate storage system 14 are networked and connected.

The access module 11 in the distributed file system 1 is configured to receive a read/write data request, send a metadata request to the metadata management unit 12 to obtain metadata corresponding to the requested data, and apply the metadata to the primary storage. System 13 or backup storage system 14 reads and writes data;

The access module 11 is an entry for the distributed file system 1 to access data, receives read and write data requests from the application, acquires metadata from the metadata management unit 12, and reads the metadata information to the primary storage system 13 and the backup storage system 14 using the metadata information. Write data. As a function module, it can be deployed in a single processing unit, such as a PC or a board. There are multiple general access modules to provide high throughput to the outside of the system, at least one.

The metadata management unit 12 is configured to: when the access module 11 requests the metadata, find a location of the requested data on the primary storage system 13 or the backup storage system 14, and construct the metadata, and return the connection. The module 11 is also used to convert the system state of the distributed file system 1 according to the device status event of the primary storage system 13. Distributed file system 1 has three states: normal state, abnormal state, and recovery state: where: In the normal state, the storage device of the primary storage system 13 is not faulty, and the data is stored on the primary storage system, and is backed up to the backup storage system 14 when necessary.

The abnormal state refers to the failure of the storage device in the primary storage system. In this case, the primary storage system 13 and the standby storage system 14 work together to coordinate the completion of data storage, and the coordination result is saved, and the coordination result is used for the data recovery process.

Recovery status refers to the primary storage system. After the failed storage device is restored, the system is triggered to perform system data recovery. The backup data on the backup storage system 14 is used to restore the data originally stored on the failed storage device.

Further, the metadata management unit 12 is responsible for finding the location of the data on the storage system and constructing the metadata when the access module 11 requests the metadata, and returns it to the access module 11. The metadata management unit 12 is also responsible for managing the system status, receiving the device status event of the primary storage system 13, converting the system status, selecting the storage system, determining the data storage device (location) according to the event information, and also responsible for the reliability of the system. Automatically back up data based on data access information. In the event of equipment failure, the available metadata is automatically formed according to the data distribution of the primary, backup, and external storage systems to ensure the availability of the system; and the system is automatically responsible for online online recovery to ensure data availability and data consistency. The metadata management unit 12 can be deployed separately on one server.

The primary storage system 13 is configured to provide the requested data to the access module 11 in the normal state. Specifically, the primary storage system 13 is the main storage of the distributed file system 1, and is used to store all data of the system for the purpose of high read/write performance. The system uses the primary storage system 13 to read and write data and all data in a normal state. It is stored on the primary storage system 13, and is composed of a high-speed storage device. The storage striping technology supports storage bandwidth aggregation to improve read and write performance, and provides a block access mode. The access module 11 can directly read and write the stored information in a block manner. data.

The backup storage system 14 is configured to provide a backup of the data to the primary storage system 13 in the abnormal state and the restored state. Specifically, the backup storage system 14 is a secondary storage of the distributed file system 1 for backing up data to support system reliability, availability, and data recoverability, and using storage striping technology to support storage bandwidth aggregation to improve read and write performance. , provide block access mode, connect The incoming module 11 can directly read and write data stored thereon in a block manner.

The external storage system 15 is configured to provide data for the primary storage system 13 . Specifically, the external storage system 15 refers to a storage system that stores more data, and may be an upper layer or other system in the same layer. The external storage system 15 is connected to other modules of the system through the network, and uses the network access method to read and write data, and is used as a supplement to the external data backup. If the primary storage system 13 or the backup storage system 14 cannot find the corresponding data, the external storage system 15 can The external storage system 15 requests data.

The above internal network is a network directly connected to the access module 11, the metadata management unit 12, the primary storage system 13 and the backup storage system 14, and may be an Ethernet or an internal bus (e.g., a PCIe bus). The metadata is mainly information describing the data attributes, and is used to support such as indicating the storage location, history information, resource information, and file records, etc., for example, may be a storage address of the data (the length and offset of the request) of the saved request (may be included) Storage device number, storage block number, data offset). Metadata can also be the inode number of the data. The index node of the data is stored in the storage device and identifies the storage address of the data. Based on the inode number of the data, the storage address of the inode can be calculated.

2 is a schematic diagram of a composition of a distributed file system according to another embodiment of the present invention. Based on the foregoing embodiment, as shown in FIG. 2, the distributed file system 1 includes an access module 11, a metadata management unit 12, and a primary storage system 13. And the alternate storage system 14 and the external storage system 15. Further, the metadata management unit 12 includes a metadata operation module, a backup module, and an exception recovery module; wherein:

The metadata operation module is respectively connected to the access module 11 and the primary storage system 13 for receiving the metadata request, requesting metadata from the primary storage system 13 or the abnormal recovery module, and returning to the access module 11 The metadata is further configured to update the system state according to the device status event reported by the received primary storage system 13; and further responsible for state management of the system, and record data control information in the read/write data request. Specifically, the metadata operation module receives all metadata requests of the access module 11 and first retrieves its metadata cache. If the metadata is not found, the metadata needs to be requested from the primary storage system 13 or the backup storage system 14. The cache obtains the metadata, and finally returns the metadata to the access module 11. In the normal state, the metadata cache is retrieved. If not, the metadata is requested from the primary storage system 13, and the data control information in the received metadata request is recorded, and then the metadata is returned to the access module 11. Data control information can be logged to memory or persisted to the database. In this embodiment, the metadata management unit 12 may include a data control information recording module connected between the metadata operation module and the backup module for permanently storing the data control information.

In the abnormal state, the metadata operation module forwards the metadata request sent by the receiving module 11 to the abnormality recovery module, and the abnormality recovery module is responsible for obtaining the metadata information to the primary storage system 13 and the backup storage system 14.

In the recovery state, the write data process is the same as in the normal state, the metadata operation module requests metadata from the main storage system 13; the read data process requests the exception recovery module to handle the processing.

The metadata operation module is responsible for state management. The primary storage system 13 is responsible for reporting device status events. When a device failure event is received, the faulty device ID is recorded, and the system status changes from normal to abnormal. When the device recovery event is received, the system status is determined by The abnormality is changed to recovery; when the recovery operation is completed, after the metadata recovery module is notified by the abnormality recovery module, the system state is changed from normal to normal.

The backup module is connected to the metadata operation module and the backup storage system 13, and is configured to read the data control information recorded by the metadata operation module, generate data backup operation control information, and send the data to the backup storage system. 14. The data on the primary storage system 13 is backed up to the alternate storage system 14. Specifically, in the normal state, the module is operated by a background thread, and the data control information recorded by the metadata operation module of the previous period is read, the data usage status is analyzed according to the backup policy, and a data backup request is generated, and then The backup operation control information (operation type, target file path, source file path) is issued to the backup storage system 14, and the file on the primary storage system is required to be backed up to a specified location on the alternate storage server. The backup module combines the backup strategy and can flexibly implement various data backup schemes, including a full backup scheme. When analyzing data, only the write data operation is analyzed. Once the data is written, a backup request needs to be generated; including a hotspot backup scheme, when analyzing data. Only analyze the read data usage (read data request times, read data frequency), according to the data hotspot conditions in the policy (the number of times that can be used, the frequency of reading data); including the backup specified data side In the case, the data feature code can be specified by the strategy, and the specified data feature code is analyzed and the backup request is generated when analyzing the data; the analysis includes the information in the data, and the request information is generated according to the request information.

The abnormality recovery module is respectively connected to the metadata operation module, the backup module, the primary storage system 13, and the backup storage system 14, and is configured to acquire the metadata in the abnormal state and the restored state. And return to the metadata operation module. The abnormality recovery module is responsible for obtaining metadata information in an abnormal, recovery state and maintaining the validity of the metadata cache, and controlling the data recovery operation of the primary storage system 13 to support system availability under the failure of the primary storage system 13.

The abnormality recovery module may include the following submodules:

a first processing submodule, configured to perform a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request in the abnormal state, and if the metadata is detected to have a missing block, The request information is sent to the alternate storage system 14 including the missing block information and the address of the external storage system 14; after the backup storage system 14 returns the metadata, a block address rebinding operation is performed, the available metadata is reassembled and transmitted. Specifically, in an abnormal state, for the read data request, the abnormality recovery module first retrieves its metadata cache through the first processing submodule, and if no metadata is found, requests the primary storage system 13 for metadata, and then The metadata performs a missing block (data is stored on the faulty storage device, detecting its storage device ID). If a missing block is found, the metadata is requested from the standby storage system 14, and the request information includes missing block information (block number, data ID). , offset) with the external storage system 15 address, after the backup storage system 14 returns metadata, a block address rebinding operation is performed. The block address rebinding operation refers to replacing the metadata information corresponding to the missing block on the spare storage system 14 with the metadata information corresponding to the missing block on the primary storage system 13, reconstituting the available metadata, and buffering the metadata. Speed up metadata acquisition.

a second processing submodule, configured to, in the abnormal state, request metadata for the standby storage system 14 and return to the metadata operation module for the write data request; and also for recording the write data operation to the data The data is synchronized to the primary storage system 13 in accordance with the recorded write data operation during the recovery process. Specifically, in an abnormal state, for the write data request, the abnormality recovery module automatically selects the backup storage system 14 as a storage target through the second processing submodule, and only supplies the standby storage system. 14 requesting the metadata and returning the request result to the metadata operation module, so that the access module 11 writes the data to the backup storage system 14 according to the metadata information, and the abnormal recovery module also records the write data operation, and the data recovery process The data is synchronized to the primary storage system 13 based on these records.

In the recovery state, the abnormal recovery module is responsible for online data recovery, and can perform data service and data recovery at the same time. The abnormality recovery module indicates the missing block recovery through the bitmap of the faulty storage device. During the recovery process, the exception recovery module is responsible for maintaining the update of the bitmap. When detecting a missing block, first compare the bitmap corresponding to the data block according to the storage device ID.

The abnormality recovery module may further include a third processing submodule for performing, in the recovery state, a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request, if the If the metadata has a missing block, the request information is sent to the backup storage system 14 including the missing block information. After the backup storage system 14 returns the metadata, the missing block recovery operation information is constructed for the primary storage system 13 to perform data recovery; and the primary is updated. The faulty storage device bitmap of the storage system 13 returns the metadata.

Specifically, in the recovery state, for the read data request, the abnormality recovery module first retrieves its metadata cache through the third processing submodule, and if the metadata is not found, requests the primary storage system 13 for metadata, and then the secondary data. The metadata obtained by the cache or the primary storage system 13 performs missing block detection. If a missing block is found, the metadata is requested from the alternate storage system 14, and the missing block information (block number, data ID, offset) is passed in the spare storage. After the system 14 returns the metadata, the primary storage system 13 is constructed to delete the block recovery operation information, and the primary storage system 13 restores the data to the backup storage system 14 or to the external storage system 15 based on this information. After recovery, update the fault memory device bitmap bitmap and return the metadata.

The abnormality recovery module may further include a fourth processing submodule, configured to, in the recovery state, request metadata for the primary storage system 13 for the write data request, and after the metadata is obtained, if the metadata is detected If there is a missing block, the failed storage device bitmap of the primary storage system 13 is updated, and the metadata is returned. Specifically, in the recovery state, for the write data request, the abnormality recovery module requests the metadata to the primary storage system 13 through the fourth processing submodule, and after obtaining the metadata, the fault storage is configured. The standby ID detects the missing block. If there is a missing block, it directly updates the fault storage device bitmap bitmap, and then returns the request result to the metadata operation module.

In the recovery state, the exception recovery module starts a background recovery thread. The recovery thread selects the recovery process based on the type of device being restored. If the storage device has no read/write failure (maybe the storage device is hot swapped and plugged in again), you only need to save the data written to the backup storage system during the abnormal period to the primary storage system, and delete the rebinding element. Data cache.

If the storage device reads and writes the fault and traverses the data of the primary storage system, the recovery process is as follows: The abnormal recovery module first retrieves its metadata cache. If the metadata cache is found, it is necessary to check whether there is data in the standby storage system. If there is data in the alternate storage system (that is, there is a re-bound metadata cache), the primary storage missing block recovery operation information is constructed, and the primary storage system restores the data to the backup storage system based on the information. After recovery, delete the re-bound metadata cache and update the faulty storage device bitma bitmap.

If the metadata is not found in the metadata cache, the metadata is requested from the primary storage system, and then the missing block detection is performed on the metadata obtained from the primary storage system. If a missing block is found, the metadata is requested from the standby storage system. Data, the request information includes missing block information (block number, data ID, offset), but does not include an external storage address, after the backup storage system returns the metadata, constructing the primary storage system missing block recovery operation information, the primary storage system is based on This information either restores data to the backup storage system or to an external storage system. After recovery, update the fault memory device bitmap bitmap.

After the background recovery thread traverses the data of the primary storage system, it also traverses the write data operation recorded by the abnormal recovery module during the abnormal period, and is responsible for saving the data written in the standby storage system to the primary storage system, and then updating the fault storage device bitmap bit. Figure and delete the metadata cache.

As shown in FIG. 2, the distributed file system may further include an active data management module connected between the primary storage system 13 and the metadata management unit 12 for managing data stored by the primary storage system 13. Respond to metadata requests and data manipulation requests. Specifically, the active data management module is responsible for responding to the metadata request, and is also responsible for receiving the recovery operation information and realizing the data recovery. There are two types of recovery operation information, one having an alternate storage system address information, and one having an external storage system address information. of. The active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system.

The distributed file system may further include an alternate data management module connected between the backup storage system 14 and the metadata management unit 12 for managing data stored by the standby storage system 14, responding to metadata requests and data operation requests. . Specifically, the standby data management module is responsible for responding to the metadata request, and is also responsible for receiving the backup request and implementing data backup. There are two types of metadata requests, the first type has external storage address information, and the second type has no external storage system address. The difference between the backup data management module and the two types of metadata requests is: If the secondary storage system 14 does not retrieve the element The data, when processing the first class, requests all data to the external storage system 15 and stores it, returning the stored metadata. When the backup storage system processes the backup request, it requests data directly from the primary storage system 13 and saves it according to the backup request.

In the above distributed file system, the primary storage system includes a plurality of high-speed storage devices, including but not limited to a high data transfer rate SCSI hard disk, a SATA hard disk, and an SSD. The active storage system is also responsible for monitoring the status of its own storage devices and reporting device events (such as device failures and device recovery). The above device faults include storage device read/write failures and hot-swappable storage devices. The backup storage system includes a plurality of high-speed storage devices and/or low-speed storage devices, wherein the high-speed storage devices include but are not limited to SCSI hard disks with high data transfer rates, SATA hard disks, and SSDs; and low-speed storage devices include, but are not limited to, low data transfer rates. Storage device.

There are two types of recovery operation information described in the above embodiments, one having alternate storage system address information and one having external storage system address information. The active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system. That is, the recovery operation information of the spare storage system address information, the main data management module recovers the recovery operation information of the external storage system address information from the backup storage system, and the main data management module recovers the data block from the external storage system. In the recovery state, when the data is read, the above recovery operation information is constructed, so that the read data service is provided and the accessed data is immediately restored. According to the locality principle of the data, the recently accessed data is also the focus of most users. The data, priority to restore these data, helps improve performance. About the external storage address: In the abnormal state, when reading data, it needs to request metadata from the standby storage system, which includes the external storage address. When the backup storage system itself does not have the requested metadata, the backup storage system first The external storage system requests data (equivalent to backing up all data from the external storage system to the alternate storage system) and returns the metadata. In this way, both the read data service and the data are backed up from the external storage system to the standby storage system, and the data can be recovered from the standby storage system during recovery, thereby speeding up data recovery. In the recovery state, when reading data or background data recovery threads, metadata may be requested from the alternate storage system, which does not include external storage addresses, so that if the standby storage system itself does not have the requested metadata, the alternate storage The system does not request data from the external storage storage system and the alternate storage system returns null. Then the primary storage system uses the external storage address to request data from the external storage system (only the data contained in the missing block is requested, and only the data contained in the missing block needs to be recovered when the data is recovered).

The distributed file system provided by the embodiment of the present invention writes data to the primary storage system and uses the asynchronous backup mechanism to back up to the backup storage system without affecting the read and write performance of the high-speed device. The data can be restored online, and the data is externally serviced. Automatically realize data recovery and speed up the recovery process; and no need to calculate when recovering data, even if the data is not backed up, it can be restored by external storage; In the CDN environment, use the policy backup data mechanism and available external storage to obtain data without backup, Partial data backup can be achieved without affecting availability; in addition, an inexpensive storage device can be used to form an alternate storage system to reduce product cost.

FIG. 3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention. As shown in FIG. 3, the process includes:

Step 1. The backup management module reads the data control information.

Step 2: The backup management module analyzes the read and write requests of the data control information according to the backup policy. If the policy requirements are met, the backup control information is formed and sent to the standby data management module, and the files on the primary storage system are backed up to the backup storage. On the server;

Step 3: The standby data management module receives the backup control request (operation type, target file path, source file path) sent by the backup management module, and uses the backup control request information according to the backup control request information. Issue a backup request according to the management module;

Step 4: The primary data management module receives the backup request, and reads data from the storage device in the primary storage system.

Step 5: The primary data management module returns data to the standby data management module;

Step 6. The standby data management module writes the data to the storage device in the standby storage system. Step 7. Return the backup status to the backup management module.

FIG. 4 is a schematic flowchart of a process in a distributed file system in an abnormal state and in a case of a read data request according to an embodiment of the present invention. As shown in FIG. 4, the process includes:

Step 1. The application sends a request for reading data to the access module.

Step 2. The access module issues a read metadata request to the metadata operation module.

Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4. The abnormality recovery module first searches for metadata in the metadata information cache, and if it finds a turn, step 6;

Step 5: The abnormality recovery module initiates a metadata request to the primary data management module, and the primary data management module receives the backup request, reads the metadata to the storage device in the primary storage system, and returns the metadata to the abnormality recovery module.

Step 6. The abnormality recovery module checks whether the metadata information returned from the metadata cache or the primary storage system has a missing block, and then initiates a metadata request to the standby storage system and carries an external backup control information (including external storage). System location, data location information), if not, return metadata information to the metadata operation module, and go to step 9;

Step 7. If the data is backed up in the standby storage system, the standby data management module returns the metadata, otherwise the standby data management module requests data from the external storage system according to the external backup control information, stores the obtained data in the standby storage system, and returns Metadata information;

Step 8. The abnormality recovery module performs a block address rebinding operation, modifies the block mapping table of the metadata of the missing block, replaces the missing block address by applying the corresponding block address on the alternate storage system, and generates the bound metadata. Cache up, return all metadata to the metadata operation module; Step 9. The metadata operation module returns the metadata to the access module.

Step 10: The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.

Step 11. The primary storage system and the backup storage system return data to the access module.

Step 12. The access module returns data to the application.

FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention. As shown in FIG. 5, the process includes:

Step 1. The application sends a request for writing data to the access module.

Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module directly initiates a metadata request to the standby storage system.

Step 5: The standby data management module receives the metadata request and constructs the metadata, and returns the metadata. Step 6. The exception recovery module returns the metadata information to the metadata operation module.

Step 7. The metadata operation module returns the metadata to the access module.

Step 8. The access module initiates a data request to the backup storage device according to the returned metadata information.

Step 9. The access module writes data to the backup storage system.

Step 10. The access module returns a write data result to the application.

FIG. 6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 6, the process includes:

Step 1. The application sends a request for writing data to the access module.

Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module retrieves the metadata cache, and if not found, requests metadata from the primary storage system, and if found, jumps to the sixth Step

Step 5: The primary storage system receives the request metadata and returns the metadata. Step 6. The abnormality recovery module checks whether there is a missing block, and if so, requests metadata from the standby storage system. If not, skips to step 11;

Step 7. The abnormality recovery module constructs the recovery control information and sends the information to the primary storage system according to the metadata returned by the standby storage system.

Step 8. The primary storage system receives the recovery control information and performs data recovery. Step 9. Return the data recovery result to the abnormality recovery module.

Step 10: The abnormality recovery module updates the missing block bitmap bitmap that has been restored;

Step 11. Return metadata obtained from the metadata cache or the primary storage system to the metadata operation module.

Step 12. The metadata operation module returns the metadata to the access module.

Step 13: The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.

Step 14. The primary storage system and the standby storage system return data to the access module. Step 15. The access module returns data to the application.

FIG. 7 is a schematic diagram of a process flow of a distributed file system responding to a write data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 7, the process includes:

Step 1. The application sends a request for writing data to the access module.

Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module requests metadata from the primary storage system.

Step 5: The primary storage system receives the request metadata and returns the metadata;

Step 6. The abnormality recovery module checks whether there is a missing block, and if so, updates the bitmap block bitmap of the missing block that has been restored, and sets the bitmap corresponding to the missing block to 1;

Step 7. Return the metadata from the metadata to the operation module.

FIG. 8 is a schematic diagram of a processing flow of a data recovery process of a distributed file system in a system recovery state according to an embodiment of the present invention. As shown in FIG. 8, the process includes: Step 1. The background data recovery thread searches the metadata cache for the metadata of the data to be restored, and if not found, skips to step 6;

Step 2. If the metadata cache is found, the background data recovery thread checks whether there is data on the standby storage system. If there is no data on the standby storage system, skip to step 11;

Step 3. If there is data on the standby storage system, the background data recovery thread constructs the primary storage missing block recovery operation information and sends the operation information to the primary storage system;

Step 4: The primary storage system receives the missing block recovery operation information and executes, restores the backup storage system data to the missing block, and returns the missing block recovery operation result;

Step 5: The background data recovery thread deletes the re-bound metadata cache and jumps to step 11; Step 6. If the metadata is not found in the metadata cache, the background data recovery thread requests metadata from the primary storage system. ;

Step 7. Perform a missing block check on the returned metadata. If there is no missing block, skip to step 11; Step 8. If there is a missing block, the background data recovery thread requests to restore data to the standby storage system; Step 9, if the 8th If the backup is successful, skip to step 11;

Step 10. If the backup in step 8 fails, the background data recovery thread requests to restore data to the external storage system.

Step 11. Update the fault memory device bitmap bitmap, and the data recovery is completed.

A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

Claim

A distributed file system, comprising: an access module, a metadata management unit connected to the access module, and an active storage system and an alternate storage respectively connected to the metadata management unit; The access module is further connected to the primary storage system and the backup storage system respectively; the access module, the metadata management unit, the primary storage system, and the backup storage system Interconnecting through a system bus; further comprising an external storage system, wherein the external storage system is connected to the primary storage system and the backup storage system through a network; wherein: the access module is configured to receive a read/write data request Sending a metadata request to the metadata management unit to acquire metadata corresponding to the requested data, and applying the metadata to read and write data to the primary storage system or the standby storage system;

The metadata management unit is configured to: when the access module requests the metadata, find a location of the requested data on the primary storage system or the backup storage system, and construct the metadata Returning to the access module, the primary storage system, configured to provide the requested data to the access module when the distributed file system is in a normal state; and the standby storage system is configured to Providing a data backup for the primary storage system when the distributed file system is in an abnormal state or a recovery state; and the external storage system is configured to provide data backup for the primary storage system.

2. The distributed file system according to claim 1, wherein the metadata management unit comprises: a metadata operation module, a backup module, and an abnormality recovery module; wherein:

The metadata operation module is respectively connected to the access module and the primary storage system, and configured to receive the metadata request, request metadata from the primary storage system or the abnormal recovery module, and The access module returns the metadata; and is further configured to update the system state according to the received device status event reported by the primary storage system; and further configured to record data control in the read/write data request Information

The backup module is connected to the metadata operation module and the backup storage system, and is configured to read the data control information recorded by the metadata operation module, and generate a data backup operation. Controlling information and transmitting to the alternate storage system to back up data on the primary storage system to the standby storage system;

The abnormality recovery module is respectively connected to the metadata operation module, the backup module, the primary storage system, and the backup storage system, and configured to acquire the abnormal state and the restored state. Metadata is returned to the metadata manipulation module.

The distributed file system according to claim 2, wherein the metadata management unit further comprises:

A data control information recording module is connected between the metadata operation module and the backup module for storing the data control information.

The distributed file system according to claim 2, wherein the abnormality recovery module comprises:

a first processing submodule, configured to perform, for a read data request, a missing block detection on metadata obtained from a cache or from the primary storage system in the abnormal state, if the metadata is detected to have a missing block Transmitting, to the alternate storage system, the request information includes missing block information and an address of the external storage system; after the backup storage system returns the metadata, performing a block address rebinding operation to reorganize available metadata And send

a second processing submodule, configured to, in the abnormal state, request metadata for the backup storage system and return to the metadata operation module for the write data request; and also for recording the write data operation, to Synchronizing data into the primary storage system according to the recorded write data operation during data recovery;

a third processing submodule, configured to perform, for the read data request, a missing block detection on the metadata obtained from the cache or from the primary storage system in the recovery state, if the metadata is detected to have a missing block Transmitting the request information to the backup storage system, including the missing block information, after the backup storage system returns the metadata, constructing the missing block recovery operation information for the primary storage system to perform data recovery; and updating the primary A faulty storage device bitmap of the storage system, returning the metadata; a fourth processing submodule, configured to, in the recovery state, request metadata for the write data request to the primary storage system, and after obtaining the metadata, if it is detected that the metadata has a missing block, Updating the failed storage device bitmap of the primary storage system to return the metadata.

The distributed file system according to claim 1 or 2 or 3 or 4, further comprising:

a primary data management module, coupled between the primary storage system and the metadata management unit, for managing data stored by the primary storage system, in response to a metadata request and a data operation request;

An alternate data management module is coupled between the alternate storage system and the metadata management unit for managing data stored by the alternate storage system, in response to a metadata request and a data operation request.

The distributed file system according to claim 5, wherein the primary storage system comprises a plurality of high-speed storage devices, and the high-speed storage device includes but is not limited to a SCSI hard disk with a high data transfer rate and a SATA hard disk. , SSD;

The backup storage system includes a plurality of high speed storage devices and/or low speed storage devices, wherein the high speed storage devices include a SCSI hard disk, a SATA hard disk, and an SSD.

7. The distributed file system of claim 1 wherein:

The normal state refers to that the primary storage system has not failed;

The abnormal state refers to a failure of the primary storage system, the primary storage system and the backup storage system work together, coordinate data storage, and save the coordination result, wherein the coordination result is used for the data recovery process;

The recovery state refers to restoring data in the primary storage system using data on the standby storage system after an abnormal state.