WO2012075845A1 - Distributed file system - Google Patents

Distributed file system Download PDF

Info

Publication number
WO2012075845A1
WO2012075845A1 PCT/CN2011/079685 CN2011079685W WO2012075845A1 WO 2012075845 A1 WO2012075845 A1 WO 2012075845A1 CN 2011079685 W CN2011079685 W CN 2011079685W WO 2012075845 A1 WO2012075845 A1 WO 2012075845A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage system
metadata
data
module
backup
Prior art date
Application number
PCT/CN2011/079685
Other languages
French (fr)
Chinese (zh)
Inventor
张辉
范家星
姜南
吴波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2012075845A1 publication Critical patent/WO2012075845A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • Embodiments of the present invention relate to data backup technologies, and in particular, to a distributed file system. Background technique
  • RAID Redundant Array of Independent Disk
  • RAID is a combination of multiple independent hard disks (physical hard disks) in different ways to form a disk group (logical hard disk), which provides higher storage performance and data backup technology than a single hard disk.
  • the prior art generally uses the storage striping management of the distributed file system to superimpose the aggregated hard disk bandwidth.
  • the striping is a management function, and the function is to spread data on multiple storage devices in a certain step size, so that the data is obtained from multiple physical storage devices in parallel during reading, and the performance of multiple physical storage devices is superimposed. .
  • the data redundancy and data striping features of RAID technology ensure high reliability and high performance of distributed file systems.
  • a distributed file system can be understood to be built on a network storage system.
  • the trend of hard disk being replaced by solid state disk is growing.
  • the solid state disk is extremely expensive, the RAID technology of solid state disk backup SSD is causing a sharp increase in server cost.
  • RAID technology is often implemented in RAID1 and RAID5.
  • RAID1 is a level 1 RAID technology. It uses a full-mirror backup. It requires two homogeneous storage systems to perform read and write operations simultaneously and mirror each other. Even if one disk is damaged, the system can work normally.
  • RAID5 is a storage solution that combines storage performance, data security, and storage costs.
  • the data and the corresponding parity information are stored on the respective disks constituting the RAID 5, and the parity information and the corresponding data are respectively stored on different disks.
  • the remaining data and corresponding parity information are used to recover the corrupted data.
  • An embodiment of the present invention provides a distributed file system, including: an access module, a metadata management unit connected to the access module, and an active storage system and an alternate storage system respectively connected to the metadata management unit.
  • the access module is further connected to the primary storage system and the backup storage system respectively; between the access module, the metadata management unit, the primary storage system, and the backup storage system And the external storage system is connected to the primary storage system and the backup storage system by using a network; wherein: the access module is configured to receive a read/write data request, Transmitting, to the metadata management unit, a metadata request to acquire metadata corresponding to the requested data, and applying the metadata to read and write data to the primary storage system or the standby storage system;
  • the metadata management unit is configured to: when the access module requests the metadata, find a location of the requested data on the primary storage system or the backup storage system, and construct the metadata Returning to the access module, the primary storage system, configured to provide the requested data to the access module when the distributed file system is in a normal state; and the standby storage system is configured to When the distributed file system is in an abnormal state or a restored state, the primary storage system is Providing data backup; the external storage system is configured to provide data backup for the primary storage system.
  • the asynchronous backup mechanism is used to backup to the backup storage system, which does not affect the read and write performance of the high-speed device;
  • the data is automatically restored during the external service process, and the recovery process is accelerated.
  • FIG. 1 is a schematic diagram showing the composition of a distributed file system according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a composition of a distributed file system according to another embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a process of a distributed file system in an abnormal state and a read data request according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of a process flow of a distributed file system in response to a write data request in a recovery state according to an embodiment of the present invention
  • FIG. 8 is a schematic diagram of a processing flow of a data recovery process in a system recovery state of a distributed file system according to an embodiment of the present invention.
  • the distributed file system 1 includes an access module 11, a metadata management unit 12 connected to the access module 11, and a separate element.
  • the primary storage system 13 and the backup storage system 14 are connected to the data management unit 12, and the access module 11 is also connected to the primary storage system 13 and the backup storage system 14, respectively; wherein, the access module 11, the metadata management unit 12, and the main The storage system 13 and the backup storage system 14 are both located in the internal network, and the functional modules are connected by a system bus;
  • the distributed file system 1 further includes an external storage system 15 located in the external network, and the external storage system 15 is used by the primary storage system 15
  • the storage system 13 and the alternate storage system 14 are networked and connected.
  • the access module 11 in the distributed file system 1 is configured to receive a read/write data request, send a metadata request to the metadata management unit 12 to obtain metadata corresponding to the requested data, and apply the metadata to the primary storage.
  • System 13 or backup storage system 14 reads and writes data;
  • the access module 11 is an entry for the distributed file system 1 to access data, receives read and write data requests from the application, acquires metadata from the metadata management unit 12, and reads the metadata information to the primary storage system 13 and the backup storage system 14 using the metadata information. Write data.
  • As a function module it can be deployed in a single processing unit, such as a PC or a board. There are multiple general access modules to provide high throughput to the outside of the system, at least one.
  • the metadata management unit 12 is configured to: when the access module 11 requests the metadata, find a location of the requested data on the primary storage system 13 or the backup storage system 14, and construct the metadata, and return the connection.
  • the module 11 is also used to convert the system state of the distributed file system 1 according to the device status event of the primary storage system 13.
  • Distributed file system 1 has three states: normal state, abnormal state, and recovery state: where: In the normal state, the storage device of the primary storage system 13 is not faulty, and the data is stored on the primary storage system, and is backed up to the backup storage system 14 when necessary.
  • the abnormal state refers to the failure of the storage device in the primary storage system.
  • the primary storage system 13 and the standby storage system 14 work together to coordinate the completion of data storage, and the coordination result is saved, and the coordination result is used for the data recovery process.
  • Recovery status refers to the primary storage system. After the failed storage device is restored, the system is triggered to perform system data recovery. The backup data on the backup storage system 14 is used to restore the data originally stored on the failed storage device.
  • the metadata management unit 12 is responsible for finding the location of the data on the storage system and constructing the metadata when the access module 11 requests the metadata, and returns it to the access module 11.
  • the metadata management unit 12 is also responsible for managing the system status, receiving the device status event of the primary storage system 13, converting the system status, selecting the storage system, determining the data storage device (location) according to the event information, and also responsible for the reliability of the system. Automatically back up data based on data access information.
  • the available metadata is automatically formed according to the data distribution of the primary, backup, and external storage systems to ensure the availability of the system; and the system is automatically responsible for online online recovery to ensure data availability and data consistency.
  • the metadata management unit 12 can be deployed separately on one server.
  • the primary storage system 13 is configured to provide the requested data to the access module 11 in the normal state.
  • the primary storage system 13 is the main storage of the distributed file system 1, and is used to store all data of the system for the purpose of high read/write performance.
  • the system uses the primary storage system 13 to read and write data and all data in a normal state. It is stored on the primary storage system 13, and is composed of a high-speed storage device.
  • the storage striping technology supports storage bandwidth aggregation to improve read and write performance, and provides a block access mode.
  • the access module 11 can directly read and write the stored information in a block manner. data.
  • the backup storage system 14 is configured to provide a backup of the data to the primary storage system 13 in the abnormal state and the restored state.
  • the backup storage system 14 is a secondary storage of the distributed file system 1 for backing up data to support system reliability, availability, and data recoverability, and using storage striping technology to support storage bandwidth aggregation to improve read and write performance. , provide block access mode, connect The incoming module 11 can directly read and write data stored thereon in a block manner.
  • the external storage system 15 is configured to provide data for the primary storage system 13 .
  • the external storage system 15 refers to a storage system that stores more data, and may be an upper layer or other system in the same layer.
  • the external storage system 15 is connected to other modules of the system through the network, and uses the network access method to read and write data, and is used as a supplement to the external data backup. If the primary storage system 13 or the backup storage system 14 cannot find the corresponding data, the external storage system 15 can The external storage system 15 requests data.
  • the above internal network is a network directly connected to the access module 11, the metadata management unit 12, the primary storage system 13 and the backup storage system 14, and may be an Ethernet or an internal bus (e.g., a PCIe bus).
  • the metadata is mainly information describing the data attributes, and is used to support such as indicating the storage location, history information, resource information, and file records, etc., for example, may be a storage address of the data (the length and offset of the request) of the saved request (may be included) Storage device number, storage block number, data offset). Metadata can also be the inode number of the data.
  • the index node of the data is stored in the storage device and identifies the storage address of the data. Based on the inode number of the data, the storage address of the inode can be calculated.
  • the distributed file system 1 includes an access module 11, a metadata management unit 12, and a primary storage system 13. And the alternate storage system 14 and the external storage system 15. Further, the metadata management unit 12 includes a metadata operation module, a backup module, and an exception recovery module; wherein:
  • the metadata operation module is respectively connected to the access module 11 and the primary storage system 13 for receiving the metadata request, requesting metadata from the primary storage system 13 or the abnormal recovery module, and returning to the access module 11
  • the metadata is further configured to update the system state according to the device status event reported by the received primary storage system 13; and further responsible for state management of the system, and record data control information in the read/write data request.
  • the metadata operation module receives all metadata requests of the access module 11 and first retrieves its metadata cache. If the metadata is not found, the metadata needs to be requested from the primary storage system 13 or the backup storage system 14. The cache obtains the metadata, and finally returns the metadata to the access module 11. In the normal state, the metadata cache is retrieved.
  • the metadata management unit 12 may include a data control information recording module connected between the metadata operation module and the backup module for permanently storing the data control information.
  • the metadata operation module forwards the metadata request sent by the receiving module 11 to the abnormality recovery module, and the abnormality recovery module is responsible for obtaining the metadata information to the primary storage system 13 and the backup storage system 14.
  • the write data process In the recovery state, the write data process is the same as in the normal state, the metadata operation module requests metadata from the main storage system 13; the read data process requests the exception recovery module to handle the processing.
  • the metadata operation module is responsible for state management.
  • the primary storage system 13 is responsible for reporting device status events. When a device failure event is received, the faulty device ID is recorded, and the system status changes from normal to abnormal. When the device recovery event is received, the system status is determined by The abnormality is changed to recovery; when the recovery operation is completed, after the metadata recovery module is notified by the abnormality recovery module, the system state is changed from normal to normal.
  • the backup module is connected to the metadata operation module and the backup storage system 13, and is configured to read the data control information recorded by the metadata operation module, generate data backup operation control information, and send the data to the backup storage system. 14.
  • the data on the primary storage system 13 is backed up to the alternate storage system 14. Specifically, in the normal state, the module is operated by a background thread, and the data control information recorded by the metadata operation module of the previous period is read, the data usage status is analyzed according to the backup policy, and a data backup request is generated, and then The backup operation control information (operation type, target file path, source file path) is issued to the backup storage system 14, and the file on the primary storage system is required to be backed up to a specified location on the alternate storage server.
  • the backup module combines the backup strategy and can flexibly implement various data backup schemes, including a full backup scheme.
  • analyzing data only the write data operation is analyzed. Once the data is written, a backup request needs to be generated; including a hotspot backup scheme, when analyzing data. Only analyze the read data usage (read data request times, read data frequency), according to the data hotspot conditions in the policy (the number of times that can be used, the frequency of reading data); including the backup specified data side
  • the data feature code can be specified by the strategy, and the specified data feature code is analyzed and the backup request is generated when analyzing the data; the analysis includes the information in the data, and the request information is generated according to the request information.
  • the abnormality recovery module is respectively connected to the metadata operation module, the backup module, the primary storage system 13, and the backup storage system 14, and is configured to acquire the metadata in the abnormal state and the restored state. And return to the metadata operation module.
  • the abnormality recovery module is responsible for obtaining metadata information in an abnormal, recovery state and maintaining the validity of the metadata cache, and controlling the data recovery operation of the primary storage system 13 to support system availability under the failure of the primary storage system 13.
  • the abnormality recovery module may include the following submodules:
  • a first processing submodule configured to perform a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request in the abnormal state, and if the metadata is detected to have a missing block,
  • the request information is sent to the alternate storage system 14 including the missing block information and the address of the external storage system 14; after the backup storage system 14 returns the metadata, a block address rebinding operation is performed, the available metadata is reassembled and transmitted.
  • the abnormality recovery module first retrieves its metadata cache through the first processing submodule, and if no metadata is found, requests the primary storage system 13 for metadata, and then The metadata performs a missing block (data is stored on the faulty storage device, detecting its storage device ID).
  • the metadata is requested from the standby storage system 14, and the request information includes missing block information (block number, data ID). , offset) with the external storage system 15 address, after the backup storage system 14 returns metadata, a block address rebinding operation is performed.
  • the block address rebinding operation refers to replacing the metadata information corresponding to the missing block on the spare storage system 14 with the metadata information corresponding to the missing block on the primary storage system 13, reconstituting the available metadata, and buffering the metadata. Speed up metadata acquisition.
  • a second processing submodule configured to, in the abnormal state, request metadata for the standby storage system 14 and return to the metadata operation module for the write data request; and also for recording the write data operation to the data
  • the data is synchronized to the primary storage system 13 in accordance with the recorded write data operation during the recovery process.
  • the abnormality recovery module automatically selects the backup storage system 14 as a storage target through the second processing submodule, and only supplies the standby storage system.
  • the access module 11 writes the data to the backup storage system 14 according to the metadata information
  • the abnormal recovery module also records the write data operation, and the data recovery process The data is synchronized to the primary storage system 13 based on these records.
  • the abnormal recovery module In the recovery state, the abnormal recovery module is responsible for online data recovery, and can perform data service and data recovery at the same time.
  • the abnormality recovery module indicates the missing block recovery through the bitmap of the faulty storage device.
  • the exception recovery module is responsible for maintaining the update of the bitmap.
  • the abnormality recovery module may further include a third processing submodule for performing, in the recovery state, a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request, if the If the metadata has a missing block, the request information is sent to the backup storage system 14 including the missing block information. After the backup storage system 14 returns the metadata, the missing block recovery operation information is constructed for the primary storage system 13 to perform data recovery; and the primary is updated. The faulty storage device bitmap of the storage system 13 returns the metadata.
  • the abnormality recovery module first retrieves its metadata cache through the third processing submodule, and if the metadata is not found, requests the primary storage system 13 for metadata, and then the secondary data.
  • the metadata obtained by the cache or the primary storage system 13 performs missing block detection. If a missing block is found, the metadata is requested from the alternate storage system 14, and the missing block information (block number, data ID, offset) is passed in the spare storage.
  • the primary storage system 13 is constructed to delete the block recovery operation information, and the primary storage system 13 restores the data to the backup storage system 14 or to the external storage system 15 based on this information.
  • the abnormality recovery module may further include a fourth processing submodule, configured to, in the recovery state, request metadata for the primary storage system 13 for the write data request, and after the metadata is obtained, if the metadata is detected If there is a missing block, the failed storage device bitmap of the primary storage system 13 is updated, and the metadata is returned.
  • the abnormality recovery module requests the metadata to the primary storage system 13 through the fourth processing submodule, and after obtaining the metadata, the fault storage is configured.
  • the standby ID detects the missing block. If there is a missing block, it directly updates the fault storage device bitmap bitmap, and then returns the request result to the metadata operation module.
  • the exception recovery module starts a background recovery thread.
  • the recovery thread selects the recovery process based on the type of device being restored. If the storage device has no read/write failure (maybe the storage device is hot swapped and plugged in again), you only need to save the data written to the backup storage system during the abnormal period to the primary storage system, and delete the rebinding element. Data cache.
  • the abnormal recovery module first retrieves its metadata cache. If the metadata cache is found, it is necessary to check whether there is data in the standby storage system. If there is data in the alternate storage system (that is, there is a re-bound metadata cache), the primary storage missing block recovery operation information is constructed, and the primary storage system restores the data to the backup storage system based on the information. After recovery, delete the re-bound metadata cache and update the faulty storage device bitma bitmap.
  • the metadata is requested from the primary storage system, and then the missing block detection is performed on the metadata obtained from the primary storage system. If a missing block is found, the metadata is requested from the standby storage system.
  • the request information includes missing block information (block number, data ID, offset), but does not include an external storage address, after the backup storage system returns the metadata, constructing the primary storage system missing block recovery operation information, the primary storage system is based on This information either restores data to the backup storage system or to an external storage system. After recovery, update the fault memory device bitmap bitmap.
  • the background recovery thread traverses the data of the primary storage system, it also traverses the write data operation recorded by the abnormal recovery module during the abnormal period, and is responsible for saving the data written in the standby storage system to the primary storage system, and then updating the fault storage device bitmap bit. Figure and delete the metadata cache.
  • the distributed file system may further include an active data management module connected between the primary storage system 13 and the metadata management unit 12 for managing data stored by the primary storage system 13.
  • the active data management module is responsible for responding to the metadata request, and is also responsible for receiving the recovery operation information and realizing the data recovery.
  • recovery operation information There are two types of recovery operation information, one having an alternate storage system address information, and one having an external storage system address information. of.
  • the active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system.
  • the distributed file system may further include an alternate data management module connected between the backup storage system 14 and the metadata management unit 12 for managing data stored by the standby storage system 14, responding to metadata requests and data operation requests. .
  • the standby data management module is responsible for responding to the metadata request, and is also responsible for receiving the backup request and implementing data backup.
  • metadata requests There are two types of metadata requests, the first type has external storage address information, and the second type has no external storage system address.
  • the difference between the backup data management module and the two types of metadata requests is: If the secondary storage system 14 does not retrieve the element The data, when processing the first class, requests all data to the external storage system 15 and stores it, returning the stored metadata.
  • the backup storage system processes the backup request, it requests data directly from the primary storage system 13 and saves it according to the backup request.
  • the primary storage system includes a plurality of high-speed storage devices, including but not limited to a high data transfer rate SCSI hard disk, a SATA hard disk, and an SSD.
  • the active storage system is also responsible for monitoring the status of its own storage devices and reporting device events (such as device failures and device recovery).
  • the above device faults include storage device read/write failures and hot-swappable storage devices.
  • the backup storage system includes a plurality of high-speed storage devices and/or low-speed storage devices, wherein the high-speed storage devices include but are not limited to SCSI hard disks with high data transfer rates, SATA hard disks, and SSDs; and low-speed storage devices include, but are not limited to, low data transfer rates. Storage device.
  • the active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system. That is, the recovery operation information of the spare storage system address information, the main data management module recovers the recovery operation information of the external storage system address information from the backup storage system, and the main data management module recovers the data block from the external storage system.
  • the above recovery operation information is constructed, so that the read data service is provided and the accessed data is immediately restored.
  • the recently accessed data is also the focus of most users.
  • the backup storage system In the abnormal state, when reading data, it needs to request metadata from the standby storage system, which includes the external storage address.
  • the backup storage system first The external storage system requests data (equivalent to backing up all data from the external storage system to the alternate storage system) and returns the metadata. In this way, both the read data service and the data are backed up from the external storage system to the standby storage system, and the data can be recovered from the standby storage system during recovery, thereby speeding up data recovery.
  • metadata may be requested from the alternate storage system, which does not include external storage addresses, so that if the standby storage system itself does not have the requested metadata, the alternate storage The system does not request data from the external storage storage system and the alternate storage system returns null. Then the primary storage system uses the external storage address to request data from the external storage system (only the data contained in the missing block is requested, and only the data contained in the missing block needs to be recovered when the data is recovered).
  • the distributed file system writes data to the primary storage system and uses the asynchronous backup mechanism to back up to the backup storage system without affecting the read and write performance of the high-speed device.
  • the data can be restored online, and the data is externally serviced. Automatically realize data recovery and speed up the recovery process; and no need to calculate when recovering data, even if the data is not backed up, it can be restored by external storage;
  • use the policy backup data mechanism and available external storage to obtain data without backup Partial data backup can be achieved without affecting availability; in addition, an inexpensive storage device can be used to form an alternate storage system to reduce product cost.
  • FIG. 3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention. As shown in FIG. 3, the process includes:
  • Step 1 The backup management module reads the data control information.
  • Step 2 The backup management module analyzes the read and write requests of the data control information according to the backup policy. If the policy requirements are met, the backup control information is formed and sent to the standby data management module, and the files on the primary storage system are backed up to the backup storage. On the server;
  • Step 3 The standby data management module receives the backup control request (operation type, target file path, source file path) sent by the backup management module, and uses the backup control request information according to the backup control request information. Issue a backup request according to the management module;
  • Step 4 The primary data management module receives the backup request, and reads data from the storage device in the primary storage system.
  • Step 5 The primary data management module returns data to the standby data management module
  • Step 6. The standby data management module writes the data to the storage device in the standby storage system.
  • Step 7. Return the backup status to the backup management module.
  • FIG. 4 is a schematic flowchart of a process in a distributed file system in an abnormal state and in a case of a read data request according to an embodiment of the present invention. As shown in FIG. 4, the process includes:
  • Step 1 The application sends a request for reading data to the access module.
  • Step 2 The access module issues a read metadata request to the metadata operation module.
  • Step 3 The metadata operation module forwards the read metadata request to the abnormality recovery module.
  • Step 4 The abnormality recovery module first searches for metadata in the metadata information cache, and if it finds a turn, step 6;
  • Step 5 The abnormality recovery module initiates a metadata request to the primary data management module, and the primary data management module receives the backup request, reads the metadata to the storage device in the primary storage system, and returns the metadata to the abnormality recovery module.
  • Step 6 The abnormality recovery module checks whether the metadata information returned from the metadata cache or the primary storage system has a missing block, and then initiates a metadata request to the standby storage system and carries an external backup control information (including external storage). System location, data location information), if not, return metadata information to the metadata operation module, and go to step 9;
  • Step 7 If the data is backed up in the standby storage system, the standby data management module returns the metadata, otherwise the standby data management module requests data from the external storage system according to the external backup control information, stores the obtained data in the standby storage system, and returns Metadata information;
  • Step 8 The abnormality recovery module performs a block address rebinding operation, modifies the block mapping table of the metadata of the missing block, replaces the missing block address by applying the corresponding block address on the alternate storage system, and generates the bound metadata. Cache up, return all metadata to the metadata operation module; Step 9. The metadata operation module returns the metadata to the access module.
  • Step 10 The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.
  • Step 11 The primary storage system and the backup storage system return data to the access module.
  • Step 12 The access module returns data to the application.
  • FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention. As shown in FIG. 5, the process includes:
  • Step 1 The application sends a request for writing data to the access module.
  • Step 2 The access module issues a read metadata request to the metadata operation module.
  • Step 3 The metadata operation module forwards the read metadata request to the abnormality recovery module.
  • Step 4 The abnormality recovery module directly initiates a metadata request to the standby storage system.
  • Step 5 The standby data management module receives the metadata request and constructs the metadata, and returns the metadata.
  • Step 6 The exception recovery module returns the metadata information to the metadata operation module.
  • Step 7 The metadata operation module returns the metadata to the access module.
  • Step 8 The access module initiates a data request to the backup storage device according to the returned metadata information.
  • Step 9 The access module writes data to the backup storage system.
  • Step 10 The access module returns a write data result to the application.
  • FIG. 6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 6, the process includes:
  • Step 1 The application sends a request for writing data to the access module.
  • Step 2 The access module issues a read metadata request to the metadata operation module.
  • Step 3 The metadata operation module forwards the read metadata request to the abnormality recovery module.
  • Step 4 The abnormality recovery module retrieves the metadata cache, and if not found, requests metadata from the primary storage system, and if found, jumps to the sixth Step
  • Step 5 The primary storage system receives the request metadata and returns the metadata.
  • Step 6 The abnormality recovery module checks whether there is a missing block, and if so, requests metadata from the standby storage system. If not, skips to step 11;
  • Step 7 The abnormality recovery module constructs the recovery control information and sends the information to the primary storage system according to the metadata returned by the standby storage system.
  • Step 8. The primary storage system receives the recovery control information and performs data recovery.
  • Step 9. Return the data recovery result to the abnormality recovery module.
  • Step 10 The abnormality recovery module updates the missing block bitmap bitmap that has been restored
  • Step 11 Return metadata obtained from the metadata cache or the primary storage system to the metadata operation module.
  • Step 12 The metadata operation module returns the metadata to the access module.
  • Step 13 The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.
  • Step 14 The primary storage system and the standby storage system return data to the access module.
  • Step 15. The access module returns data to the application.
  • FIG. 7 is a schematic diagram of a process flow of a distributed file system responding to a write data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 7, the process includes:
  • Step 1 The application sends a request for writing data to the access module.
  • Step 2 The access module issues a read metadata request to the metadata operation module.
  • Step 3 The metadata operation module forwards the read metadata request to the abnormality recovery module.
  • Step 4 The abnormality recovery module requests metadata from the primary storage system.
  • Step 5 The primary storage system receives the request metadata and returns the metadata
  • Step 6 The abnormality recovery module checks whether there is a missing block, and if so, updates the bitmap block bitmap of the missing block that has been restored, and sets the bitmap corresponding to the missing block to 1;
  • Step 7. Return the metadata from the metadata to the operation module.
  • FIG. 8 is a schematic diagram of a processing flow of a data recovery process of a distributed file system in a system recovery state according to an embodiment of the present invention. As shown in FIG. 8, the process includes: Step 1. The background data recovery thread searches the metadata cache for the metadata of the data to be restored, and if not found, skips to step 6;
  • Step 2 If the metadata cache is found, the background data recovery thread checks whether there is data on the standby storage system. If there is no data on the standby storage system, skip to step 11;
  • Step 3 If there is data on the standby storage system, the background data recovery thread constructs the primary storage missing block recovery operation information and sends the operation information to the primary storage system;
  • Step 4 The primary storage system receives the missing block recovery operation information and executes, restores the backup storage system data to the missing block, and returns the missing block recovery operation result;
  • Step 5 The background data recovery thread deletes the re-bound metadata cache and jumps to step 11; Step 6. If the metadata is not found in the metadata cache, the background data recovery thread requests metadata from the primary storage system. ;
  • Step 7. Perform a missing block check on the returned metadata. If there is no missing block, skip to step 11; Step 8. If there is a missing block, the background data recovery thread requests to restore data to the standby storage system; Step 9, if the 8th If the backup is successful, skip to step 11;
  • Step 10 If the backup in step 8 fails, the background data recovery thread requests to restore data to the external storage system.
  • Step 11 Update the fault memory device bitmap bitmap, and the data recovery is completed.

Abstract

Provided is a distributed file system, including an access module (11) for the distributed file system, a metadata management unit (12), a primary storage system (13), a standby storage system (14), and an external storage system (15). Among them, the access module (11), the metadata management unit (12), the primary storage system (13) and the standby storage system (14) are connected to each other via a system bus; and the external storage system (15) is connected to the primary storage system (13) and the standby storage system (14) via a network. In the distributed file system provided by the present invention, by writing data into the primary storage system (13) and backing up the same into the standby storage system (14) using an asynchronous backup mechanism, the read-write performance of high speed equipment will not be affected, online data recovery can be achieved, and data recovery can be automatically realized during data external service, accelerating the recovery process.

Description

分布式文件系统 技术领域  Distributed file system
本发明实施例涉及数据备份技术, 尤其涉及一种分布式文件系统。 背景技术  Embodiments of the present invention relate to data backup technologies, and in particular, to a distributed file system. Background technique
随着互联网向更宽更广方向发展, 各行各业应用越来越多, 特别是流媒 体应用, 内容发布网络(Content Delivery Network; 以下简称: CDN ) 中的 内容供应服务器性能及可靠性越来越重要, 而承载这些应用数据的核心文件 系统对性能和可靠性要求也越来越高。  With the development of the Internet in a broader and broader direction, there are more and more applications in various industries, especially streaming media applications. The content delivery server performance and reliability in the Content Delivery Network (CDN) are coming. The more important, the core file system hosting these application data is increasingly demanding performance and reliability.
对于可靠性, 现有技术主要采用独立冗余磁盘阵列 ( Redundant Array of Independent Disk; 以下简称: RAID )技术使用冗余备份特性保证。 简单的说, RAID是一种把多块独立的硬盘(物理硬盘)按不同的方式组合起来形成一 个硬盘组(逻辑硬盘) , 从而提供比单个硬盘更高的存储性能和提供数据备 份技术。 而对于性能, 现有技术通常采用分布式文件系统的存储条带化管理 来叠加聚合硬盘带宽。 所述的条带化为一种管理功能, 其作用是将数据按一 定步长分散在多个存储设备上,使得读取时并行从多个物理存储设备上获取, 实现多物理存储设备性能叠加。 RAID技术的数据冗余与数据条带化特性, 保证了分布式文件系统的高可靠性与高性能。  For reliability, the Redundant Array of Independent Disk (RAID) technology is mainly guaranteed by the redundancy backup feature. Simply put, RAID is a combination of multiple independent hard disks (physical hard disks) in different ways to form a disk group (logical hard disk), which provides higher storage performance and data backup technology than a single hard disk. For performance, the prior art generally uses the storage striping management of the distributed file system to superimpose the aggregated hard disk bandwidth. The striping is a management function, and the function is to spread data on multiple storage devices in a certain step size, so that the data is obtained from multiple physical storage devices in parallel during reading, and the performance of multiple physical storage devices is superimposed. . The data redundancy and data striping features of RAID technology ensure high reliability and high performance of distributed file systems.
一般来说, 分布式文件系统可以理解成建立在一个网络存储系统上的。 在高性能需求不断增加形式下, 硬盘逐渐被固态硬盘取代的趋势发展, 由于 固态盘极为昂贵, 此时采用固态硬盘备份固态硬盘的 RAID技术导致了服务 器成本开销剧增。 RAID技术常采用 RAID1与 RAID5 实现。 其中, RAID1 是第 1级 RAID技术, 采用一种完整镜像备份, 需要两个同质的存储系统同 步进行读写操作,互为镜像,即使有一个磁盘损坏,系统仍能正常工作。 RAID5 是一种存储性能、 数据安全和存储成本兼顾的存储解决方案, 不对存储的数 据进行备份,而是把数据和相对应的奇偶校验信息存储到组成 RAID5的各个 磁盘上, 并且奇偶校验信息和相对应的数据分别存储于不同的磁盘上。 当 RAID5的一个磁盘数据发生损坏后, 利用余下的数据和相应的奇偶校验信息 去恢复被损坏的数据。 In general, a distributed file system can be understood to be built on a network storage system. In the form of increasing demand for high performance, the trend of hard disk being replaced by solid state disk is growing. Because the solid state disk is extremely expensive, the RAID technology of solid state disk backup SSD is causing a sharp increase in server cost. RAID technology is often implemented in RAID1 and RAID5. Among them, RAID1 is a level 1 RAID technology. It uses a full-mirror backup. It requires two homogeneous storage systems to perform read and write operations simultaneously and mirror each other. Even if one disk is damaged, the system can work normally. RAID5 is a storage solution that combines storage performance, data security, and storage costs. According to the backup, the data and the corresponding parity information are stored on the respective disks constituting the RAID 5, and the parity information and the corresponding data are respectively stored on different disks. When a disk data of RAID 5 is damaged, the remaining data and corresponding parity information are used to recover the corrupted data.
在实现本发明过程中, 发明人发现现有技术中至少存在如下问题: 在 In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art:
CDN网络中, 使用 RAID技术存在以下缺点: 同等容量, 组成 RAID需要更 多的磁盘, 特别是 RAID1需要双倍的磁盘, 并且固态盘极为昂贵, 因而存储 成本导致系统成本过高; RAID5任何数据的修改需要重写校验, 导致写入数 据稍慢, 数据恢复时间久, 可能影响业务; 而且, 数据备份的恢复受到损坏 磁盘的数量的限制。 发明内容 In the CDN network, the use of RAID technology has the following disadvantages: The same capacity, the RAID needs more disks, especially RAID1 requires double the disk, and the SSD is extremely expensive, so the storage cost leads to the system cost is too high; RAID5 any data Modifications need to be rewritten to verify that the write data is slightly slower and the data recovery time is long, which may affect the business; moreover, the recovery of the data backup is limited by the number of damaged disks. Summary of the invention
本发明实施例提供一种分布式文件系统, 包括: 接入模块、 与所述接入 模块连接的元数据管理单元, 以及分别与所述元数据管理单元连接的主用存 储系统和备用存储系统, 所述接入模块还分别与所述主用存储系统和所述备 用存储系统连接; 所述接入模块、 所述元数据管理单元、 所述主用存储系统 和所述备用存储系统之间通过系统总线连接; 还包括外部存储系统, 所述外 部存储系统通过网络与所述主用存储系统和所述备用存储系统连接; 其中: 所述接入模块, 用于接收读 /写数据请求, 向所述元数据管理单元发送元 数据请求以获取所请求的数据对应的元数据, 并应用所述元数据向所述主用 存储系统或所述备用存储系统读写数据;  An embodiment of the present invention provides a distributed file system, including: an access module, a metadata management unit connected to the access module, and an active storage system and an alternate storage system respectively connected to the metadata management unit. The access module is further connected to the primary storage system and the backup storage system respectively; between the access module, the metadata management unit, the primary storage system, and the backup storage system And the external storage system is connected to the primary storage system and the backup storage system by using a network; wherein: the access module is configured to receive a read/write data request, Transmitting, to the metadata management unit, a metadata request to acquire metadata corresponding to the requested data, and applying the metadata to read and write data to the primary storage system or the standby storage system;
所述元数据管理单元, 用于在所述接入模块请求所述元数据时, 查找所 请求的数据在所述主用存储系统或所述备用存储系统上的位置, 并构造所述 元数据, 返回给所述接入模块; 所述主用存储系统, 用于在所述分布式文件 系统处于正常状态时为所述接入模块提供所请求的数据;所述备用存储系统, 用于在所述分布式文件系统处于异常状态或恢复状态时, 为所述主用存储系 统提供数据备份; 所述外部存储系统, 用于为所述主用存储系统提供数据备份。 本发明实施例提供的分布式文件系统中, 通过将数据写入到主用存储系 统中, 采用异步备份机制备份到备用存储系统, 不影响高速设备的读写性能; 而且可以实现在线恢复数据, 数据对外服务过程中自动实现数据恢复, 加快 恢复过程。 附图说明 The metadata management unit is configured to: when the access module requests the metadata, find a location of the requested data on the primary storage system or the backup storage system, and construct the metadata Returning to the access module, the primary storage system, configured to provide the requested data to the access module when the distributed file system is in a normal state; and the standby storage system is configured to When the distributed file system is in an abnormal state or a restored state, the primary storage system is Providing data backup; the external storage system is configured to provide data backup for the primary storage system. In the distributed file system provided by the embodiment of the present invention, by writing data to the primary storage system, the asynchronous backup mechanism is used to backup to the backup storage system, which does not affect the read and write performance of the high-speed device; The data is automatically restored during the external service process, and the recovery process is accelerated. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1为本发明一实施例分布式文件系统组成示意图;  1 is a schematic diagram showing the composition of a distributed file system according to an embodiment of the present invention;
图 2为本发明另一实施例分布式文件系统组成示意图;  2 is a schematic diagram of a composition of a distributed file system according to another embodiment of the present invention;
图 3为本发明实施例分布式文件系统在正常状态下处理流程示意图; 图 4为本发明实施例分布式文件系统在异常状态下并且为读数据请求情 况下的处理流程示意图;  3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention; FIG. 4 is a schematic flowchart of a process of a distributed file system in an abnormal state and a read data request according to an embodiment of the present invention;
图 5为本发明实施例分布式文件系统在异常状态下并且为写数据请求情 况下的处理流程示意图;  FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention; FIG.
图 6为本发明实施例分布式文件系统在恢复状态下响应读数据请求的处 理流程示意图;  6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention;
图 7本发明实施例分布式文件系统在恢复状态下响应写数据请求的处理 流程示意图;  7 is a schematic diagram of a process flow of a distributed file system in response to a write data request in a recovery state according to an embodiment of the present invention;
图 8为本发明实施例分布式文件系统在系统恢复状态下数据恢复过程 的处理流程示意图。 具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提下所获 得的所有其他实施例, 都属于本发明保护的范围。 FIG. 8 is a schematic diagram of a processing flow of a data recovery process in a system recovery state of a distributed file system according to an embodiment of the present invention. detailed description The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图 1为本发明一实施例分布式文件系统组成示意图, 如图 1所示, 该分 布式文件系统 1包括接入模块 11、与接入模块 11连接的元数据管理单元 12, 以及分别与元数据管理单元 12连接的主用存储系统 13和备用存储系统 14, 接入模块 11还分别与主用存储系统 13和备用存储系统 14连接; 其中, 接入 模块 11、 元数据管理单元 12、 主用存储系统 13和备用存储系统 14均位于内 部网络中, 各功能模块之间通过系统总线连接; 该分布式文件系统 1还包括 位于外部网络中的外部存储系统 15, 外部存储系统 15通过主用存储系统 13 和备用存储系统 14网络与连接。  1 is a schematic diagram of a distributed file system according to an embodiment of the present invention. As shown in FIG. 1, the distributed file system 1 includes an access module 11, a metadata management unit 12 connected to the access module 11, and a separate element. The primary storage system 13 and the backup storage system 14 are connected to the data management unit 12, and the access module 11 is also connected to the primary storage system 13 and the backup storage system 14, respectively; wherein, the access module 11, the metadata management unit 12, and the main The storage system 13 and the backup storage system 14 are both located in the internal network, and the functional modules are connected by a system bus; the distributed file system 1 further includes an external storage system 15 located in the external network, and the external storage system 15 is used by the primary storage system 15 The storage system 13 and the alternate storage system 14 are networked and connected.
分布式文件系统 1中的接入模块 11 , 用于接收读 /写数据请求, 向元数据 管理单元 12发送元数据请求以获取所请求的数据对应的元数据,并应用元数 据向主用存储系统 13或备用存储系统 14读写数据;  The access module 11 in the distributed file system 1 is configured to receive a read/write data request, send a metadata request to the metadata management unit 12 to obtain metadata corresponding to the requested data, and apply the metadata to the primary storage. System 13 or backup storage system 14 reads and writes data;
接入模块 11是分布式文件系统 1访问数据的入口,接收应用程序的读写 数据请求, 向元数据管理单元 12获取元数据, 使用元数据信息向主用存储系 统 13、 备用存储系统 14读写数据。 作为一个功能模块可以单独部署在一个 处理单元, 如 PC机, 单板。 一般接入模块有多个, 以向系统外部提供高吞 吐量, 至少有一个。  The access module 11 is an entry for the distributed file system 1 to access data, receives read and write data requests from the application, acquires metadata from the metadata management unit 12, and reads the metadata information to the primary storage system 13 and the backup storage system 14 using the metadata information. Write data. As a function module, it can be deployed in a single processing unit, such as a PC or a board. There are multiple general access modules to provide high throughput to the outside of the system, at least one.
元数据管理单元 12, 用于在接入模块 11请求所述元数据时, 查找所请 求的数据在主用存储系统 13或备用存储系统 14上的位置, 并构造所述元数 据, 返回给接入模块 11 ; 还用于根据主用存储系统 13 的设备状态事件转换 本分布式文件系统 1的系统状态。 分布式文件系统 1具有三种状态: 正常状 态、 异常状态和恢复状态: 其中: 正常状态, 指主用存储系统 13的存储设备没有出现故障, 数据存储在主 用存储系统上, 有需要时会备份到备用存储系统 14上。 The metadata management unit 12 is configured to: when the access module 11 requests the metadata, find a location of the requested data on the primary storage system 13 or the backup storage system 14, and construct the metadata, and return the connection. The module 11 is also used to convert the system state of the distributed file system 1 according to the device status event of the primary storage system 13. Distributed file system 1 has three states: normal state, abnormal state, and recovery state: where: In the normal state, the storage device of the primary storage system 13 is not faulty, and the data is stored on the primary storage system, and is backed up to the backup storage system 14 when necessary.
异常状态, 指主用存储系统 13有存储设备出现故障, 此时需要主用存储 系统 13、 备用存储系统 14共同工作, 协调完成数据存储, 并保存协调结果, 协调结果用于数据恢复过程。  The abnormal state refers to the failure of the storage device in the primary storage system. In this case, the primary storage system 13 and the standby storage system 14 work together to coordinate the completion of data storage, and the coordination result is saved, and the coordination result is used for the data recovery process.
恢复状态, 指主用存储系统 13恢复了故障存储设备后, 触发系统进行系 统数据恢复,主要使用备用存储系统 14上的备份数据恢复到原来存储在故障 存储设备上的数据。  Recovery status refers to the primary storage system. After the failed storage device is restored, the system is triggered to perform system data recovery. The backup data on the backup storage system 14 is used to restore the data originally stored on the failed storage device.
进一步地, 元数据管理单元 12负责在接入模块 11请求元数据时, 查找 数据在存储系统上的位置并构造元数据, 返回给接入模块 11。 元数据管理单 元 12也负责管理系统状态, 其接收主用存储系统 13设备状态事件, 根据事 件信息, 转换系统状态、 选择存储系统, 决定数据存储设备(位置) ; 还负 责系统的可靠性, 其根据数据访问信息, 自动备份数据。 在设备故障时, 根 据主用、 备用、 外部存储系统的数据分布情况, 自动组成可用元数据, 保证 系统的可用性; 以及负责系统数据自动在线恢复, 确保数据的可用性与数据 一致性。 元数据管理单元 12可以单独部署在一个服务器上。  Further, the metadata management unit 12 is responsible for finding the location of the data on the storage system and constructing the metadata when the access module 11 requests the metadata, and returns it to the access module 11. The metadata management unit 12 is also responsible for managing the system status, receiving the device status event of the primary storage system 13, converting the system status, selecting the storage system, determining the data storage device (location) according to the event information, and also responsible for the reliability of the system. Automatically back up data based on data access information. In the event of equipment failure, the available metadata is automatically formed according to the data distribution of the primary, backup, and external storage systems to ensure the availability of the system; and the system is automatically responsible for online online recovery to ensure data availability and data consistency. The metadata management unit 12 can be deployed separately on one server.
主用存储系统 13 , 用于在所述的正常状态下为接入模块 11提供所请求 的数据。 具体地, 主用存储系统 13是分布式文件系统 1的主要存储, 用来保 存系统所有数据, 以高读写性能为目的, 系统正常状态下使用主用存储系统 13读写数据并且全部数据都保存在主用存储系统 13上, 由高速存储设备组 成, 使用存储条带化技术支持存储带宽聚合提高读写性能, 提供块访问方式, 接入模块 11可以以块方式直接读写其上存储的数据。  The primary storage system 13 is configured to provide the requested data to the access module 11 in the normal state. Specifically, the primary storage system 13 is the main storage of the distributed file system 1, and is used to store all data of the system for the purpose of high read/write performance. The system uses the primary storage system 13 to read and write data and all data in a normal state. It is stored on the primary storage system 13, and is composed of a high-speed storage device. The storage striping technology supports storage bandwidth aggregation to improve read and write performance, and provides a block access mode. The access module 11 can directly read and write the stored information in a block manner. data.
备用存储系统 14, 用于在所述的异常状态和所述的恢复状态下, 为主用 存储系统 13提供数据备份。 具体地, 备用存储系统 14是分布式文件系统 1 的次要存储, 用来备份数据以支持系统的可靠性、 可用性、 数据可恢复性, 使用存储条带化技术支持存储带宽聚合提高读写性能, 提供块访问方式, 接 入模块 11可以以块方式直接读写其上存储的数据。 The backup storage system 14 is configured to provide a backup of the data to the primary storage system 13 in the abnormal state and the restored state. Specifically, the backup storage system 14 is a secondary storage of the distributed file system 1 for backing up data to support system reliability, availability, and data recoverability, and using storage striping technology to support storage bandwidth aggregation to improve read and write performance. , provide block access mode, connect The incoming module 11 can directly read and write data stored thereon in a block manner.
外部存储系统 15 , 用于为主用存储系统 13提供数据 ^分。 具体地, 外 部存储系统 15 指存储有更多数据的存储系统, 可以是上一层或同层其它系 统。 外部存储系统 15通过网络与系统的其他模块连接起来, 使用网络访问方 式读写数据, 用作外部数据备份补充, 如果在主用存储系统 13、 备用存储系 统 14找不到相应数据, 就可以向外部存储系统 15请求数据。  The external storage system 15 is configured to provide data for the primary storage system 13 . Specifically, the external storage system 15 refers to a storage system that stores more data, and may be an upper layer or other system in the same layer. The external storage system 15 is connected to other modules of the system through the network, and uses the network access method to read and write data, and is used as a supplement to the external data backup. If the primary storage system 13 or the backup storage system 14 cannot find the corresponding data, the external storage system 15 can The external storage system 15 requests data.
上述的内部网络是连接接入模块 11、 元数据管理单元 12、 主用存储系统 13和备用存储系统 14直连的网络,可以是以太网也可以是内部总线(如 PCIe 总线) 。 元数据主要是描述数据属性的信息, 用来支持如指示存储位置、 历 史信息、 资源信息以及文件记录等, 例如可以是保存请求的数据(请求的长 度和偏移) 的存储地址(可以是包括存储设备号、 存储块号、 数据偏移) 。 元数据也可以是数据的索引节点号。 数据的索引节点保存在存储设备中, 并 标识数据的存储地址。 根据数据的索引节点号, 可以计算出索引节点的存储 地址。  The above internal network is a network directly connected to the access module 11, the metadata management unit 12, the primary storage system 13 and the backup storage system 14, and may be an Ethernet or an internal bus (e.g., a PCIe bus). The metadata is mainly information describing the data attributes, and is used to support such as indicating the storage location, history information, resource information, and file records, etc., for example, may be a storage address of the data (the length and offset of the request) of the saved request (may be included) Storage device number, storage block number, data offset). Metadata can also be the inode number of the data. The index node of the data is stored in the storage device and identifies the storage address of the data. Based on the inode number of the data, the storage address of the inode can be calculated.
图 2为本发明另一实施例分布式文件系统组成示意图,基于上述实施例, 如图 2所示, 该分布式文件系统 1包括接入模块 11、 元数据管理单元 12、 主 用存储系统 13和备用存储系统 14和外部存储系统 15。 进一步地, 元数据管 理单元 12包括元数据操作模块、 备份模块和异常恢复模块; 其中:  2 is a schematic diagram of a composition of a distributed file system according to another embodiment of the present invention. Based on the foregoing embodiment, as shown in FIG. 2, the distributed file system 1 includes an access module 11, a metadata management unit 12, and a primary storage system 13. And the alternate storage system 14 and the external storage system 15. Further, the metadata management unit 12 includes a metadata operation module, a backup module, and an exception recovery module; wherein:
所述元数据操作模块, 分别与接入模块 11、 主用存储系统 13连接, 用 于接收所述元数据请求,向主用存储系统 13或异常恢复模块请求元数据并向 接入模块 11返回所述元数据; 还用于根据接收的主用存储系统 13上报的设 备状态事件, 更新所述系统状态; 还负责系统的状态管理, 记录所述读 /写数 据请求中的数据控制信息。 具体地, 所述元数据操作模块接收接入模块 11所 有的元数据请求, 首先检索其元数据緩存, 如果没有查找到元数据, 则需要 向主用存储系统 13或备用存储系统 14请求元数据, 緩存得到元数据, 最后 返回元数据给接入模块 11。 在正常状态下, 检索元数据緩存, 如果没有, 则向主用存储系统 13请求 元数据, 同时记录下接收到的元数据请求中的数据控制信息, 然后返回元数 据给接入模块 11。 数据控制信息可以记录到内存, 也可以持久化到数据库。 本实施例中, 元数据管理单元 12可以包括一个数据控制信息记录模块, 连接 在元数据操作模块和备份模块之间, 用于永久性存储所述的数据控制信息。 The metadata operation module is respectively connected to the access module 11 and the primary storage system 13 for receiving the metadata request, requesting metadata from the primary storage system 13 or the abnormal recovery module, and returning to the access module 11 The metadata is further configured to update the system state according to the device status event reported by the received primary storage system 13; and further responsible for state management of the system, and record data control information in the read/write data request. Specifically, the metadata operation module receives all metadata requests of the access module 11 and first retrieves its metadata cache. If the metadata is not found, the metadata needs to be requested from the primary storage system 13 or the backup storage system 14. The cache obtains the metadata, and finally returns the metadata to the access module 11. In the normal state, the metadata cache is retrieved. If not, the metadata is requested from the primary storage system 13, and the data control information in the received metadata request is recorded, and then the metadata is returned to the access module 11. Data control information can be logged to memory or persisted to the database. In this embodiment, the metadata management unit 12 may include a data control information recording module connected between the metadata operation module and the backup module for permanently storing the data control information.
在异常状态下,元数据操作模块把接收模块 11发来的元数据请求转发给 异常恢复模块, 由异常恢复模块负责向主用存储系统 13、 备用存储系统 14 获取元数据信息。  In the abnormal state, the metadata operation module forwards the metadata request sent by the receiving module 11 to the abnormality recovery module, and the abnormality recovery module is responsible for obtaining the metadata information to the primary storage system 13 and the backup storage system 14.
在恢复状态下, 写数据过程跟正常状态下一样, 元数据操作模块向主用 存储系统 13请求元数据; 读数据过程, 请求异常恢复模块负责处理。  In the recovery state, the write data process is the same as in the normal state, the metadata operation module requests metadata from the main storage system 13; the read data process requests the exception recovery module to handle the processing.
元数据操作模块负责状态管理, 主用存储系统 13 负责上报设备状态事 件, 当接收到设备故障事件, 记录故障设备 ID, 系统状态由正常转为异常; 当接收到设备恢复事件时, 系统状态由异常转为恢复; 当恢复操作完成时, 由异常恢复模块通报元数据操作模块后, 系统状态由恢复转为正常。  The metadata operation module is responsible for state management. The primary storage system 13 is responsible for reporting device status events. When a device failure event is received, the faulty device ID is recorded, and the system status changes from normal to abnormal. When the device recovery event is received, the system status is determined by The abnormality is changed to recovery; when the recovery operation is completed, after the metadata recovery module is notified by the abnormality recovery module, the system state is changed from normal to normal.
所述备份模块, 与所述元数据操作模块和备用存储系统 13连接, 用于读 取所述元数据操作模块记录下来的所述数据控制信息, 生成数据备份操作控 制信息并发送给备用存储系统 14, 以将主用存储系统 13上的数据备份到备 用存储系统 14中。 具体地, 所述 ^分模块在正常状态下, 以后台线程运行, 读取前一段时间的元数据操作模块记录下来的数据控制信息, 根据备份策略 分析出数据使用状况并生成数据备份请求,然后向备用存储系统 14上发出备 份操作控制信息 (操作类型, 目标文件路径, 源文件路径) , 要求将主用存 储系统上的文件备份到备用存储服务器上指定位置。 所述备份模块结合备份 策略, 可以灵活实现各种数据备份方案, 包括全备份方案, 分析数据时只分 析写数据操作, 一旦是有写数据就需要生成备份请求; 包括热点备份方案, 分析数据时只分析读数据使用情况(读数据请求次数、 读数据频率) , 根据 策略中的数据热点条件(可以使用次数、 读数据频率); 包括备份指定数据方 案, 可以由策略指定数据特征码, 分析数据时分析出指定数据特征码并生成备 份请求; 包括分析出数据里备 ^求信息, 根据 ^分请求信息生成备 ^求。 The backup module is connected to the metadata operation module and the backup storage system 13, and is configured to read the data control information recorded by the metadata operation module, generate data backup operation control information, and send the data to the backup storage system. 14. The data on the primary storage system 13 is backed up to the alternate storage system 14. Specifically, in the normal state, the module is operated by a background thread, and the data control information recorded by the metadata operation module of the previous period is read, the data usage status is analyzed according to the backup policy, and a data backup request is generated, and then The backup operation control information (operation type, target file path, source file path) is issued to the backup storage system 14, and the file on the primary storage system is required to be backed up to a specified location on the alternate storage server. The backup module combines the backup strategy and can flexibly implement various data backup schemes, including a full backup scheme. When analyzing data, only the write data operation is analyzed. Once the data is written, a backup request needs to be generated; including a hotspot backup scheme, when analyzing data. Only analyze the read data usage (read data request times, read data frequency), according to the data hotspot conditions in the policy (the number of times that can be used, the frequency of reading data); including the backup specified data side In the case, the data feature code can be specified by the strategy, and the specified data feature code is analyzed and the backup request is generated when analyzing the data; the analysis includes the information in the data, and the request information is generated according to the request information.
所述异常恢复模块, 分别与所述元数据操作模块、 所述备份模块、 主用 存储系统 13和备用存储系统 14连接, 用于在所述异常状态和所述恢复状态 下获取所述元数据并返回给所述元数据操作模块。 所述异常恢复模块负责异 常、 恢复状态下获取元数据信息并维护元数据緩存的有效性, 以及控制主用 存储系统 13数据恢复操作,以支持主用存储系统 13发生故障下系统可用性。  The abnormality recovery module is respectively connected to the metadata operation module, the backup module, the primary storage system 13, and the backup storage system 14, and is configured to acquire the metadata in the abnormal state and the restored state. And return to the metadata operation module. The abnormality recovery module is responsible for obtaining metadata information in an abnormal, recovery state and maintaining the validity of the metadata cache, and controlling the data recovery operation of the primary storage system 13 to support system availability under the failure of the primary storage system 13.
所述异常恢复模块可以包括如下子模块:  The abnormality recovery module may include the following submodules:
第一处理子模块, 用于在所述异常状态下, 对于读数据请求, 对从緩存 或从主用存储系统 13获得的元数据进行缺失块检测,若检测到所述元数据有 缺失块,则向备用存储系统 14发送请求信息包括缺失块信息和外部存储系统 14的地址; 在备用存储系统 14返回所述元数据后, 进行块地址重绑定操作, 重组可用的元数据并发送。 具体地, 在异常状态下, 对于读数据请求, 所述 异常恢复模块通过第一处理子模块首先检索其元数据緩存, 如果没有查找到 元数据, 向主用存储系统 13请求元数据, 接着对元数据进行缺失块(数据存 储在故障存储设备上, 检测其存储设备 ID )检测, 如果发现有缺失块, 再向 备用存储系统 14请求元数据, 请求信息包括缺失块信息 (块号, 数据 ID, 偏移) 与外部存储系统 15地址, 在备用存储系统 14返回元数据后, 进行块 地址重绑定操作。块地址重绑定操作指把备用存储系统 14上的缺失块对应的 元数据信息替换掉主用存储系统 13上的缺失块对应的元数据信息,重组成可 用的元数据并緩存此元数据以加快元数据获取。  a first processing submodule, configured to perform a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request in the abnormal state, and if the metadata is detected to have a missing block, The request information is sent to the alternate storage system 14 including the missing block information and the address of the external storage system 14; after the backup storage system 14 returns the metadata, a block address rebinding operation is performed, the available metadata is reassembled and transmitted. Specifically, in an abnormal state, for the read data request, the abnormality recovery module first retrieves its metadata cache through the first processing submodule, and if no metadata is found, requests the primary storage system 13 for metadata, and then The metadata performs a missing block (data is stored on the faulty storage device, detecting its storage device ID). If a missing block is found, the metadata is requested from the standby storage system 14, and the request information includes missing block information (block number, data ID). , offset) with the external storage system 15 address, after the backup storage system 14 returns metadata, a block address rebinding operation is performed. The block address rebinding operation refers to replacing the metadata information corresponding to the missing block on the spare storage system 14 with the metadata information corresponding to the missing block on the primary storage system 13, reconstituting the available metadata, and buffering the metadata. Speed up metadata acquisition.
第二处理子模块, 用于在所述异常状态下, 对于写数据请求, 仅向备用 存储系统 14请求元数据并返回给所述元数据操作模块;还用于记录写数据操 作, 以在数据恢复过程中根据记录的所述写数据操作将数据同步到主用存储 系统 13中。 具体地, 在异常状态下, 对于写数据请求, 所述异常恢复模块通 过第二处理子模块自动选择备用存储系统 14为存储目标,只向备用存储系统 14请求元数据并返回请求结果到元数据操作模块, 这样, 接入模块 11根据 元数据信息会把数据都写到备用存储系统 14上,异常恢复模块还会记录下写 数据操作,数据恢复过程中,会根据这些记录把数据同步到主用存储系统 13。 a second processing submodule, configured to, in the abnormal state, request metadata for the standby storage system 14 and return to the metadata operation module for the write data request; and also for recording the write data operation to the data The data is synchronized to the primary storage system 13 in accordance with the recorded write data operation during the recovery process. Specifically, in an abnormal state, for the write data request, the abnormality recovery module automatically selects the backup storage system 14 as a storage target through the second processing submodule, and only supplies the standby storage system. 14 requesting the metadata and returning the request result to the metadata operation module, so that the access module 11 writes the data to the backup storage system 14 according to the metadata information, and the abnormal recovery module also records the write data operation, and the data recovery process The data is synchronized to the primary storage system 13 based on these records.
恢复状态下, 异常恢复模块负责在线数据恢复, 可以同时进行数据服务 与数据恢复。 异常恢复模块通过故障存储设备 bitmap位图表示缺失块恢复情 况, 恢复过程中, 异常恢复模块负责维护 bitmap的更新。 检测缺失块时, 先 根据存储设备 ID, 再对比数据块所对应的 bitmap是否已经恢复。  In the recovery state, the abnormal recovery module is responsible for online data recovery, and can perform data service and data recovery at the same time. The abnormality recovery module indicates the missing block recovery through the bitmap of the faulty storage device. During the recovery process, the exception recovery module is responsible for maintaining the update of the bitmap. When detecting a missing block, first compare the bitmap corresponding to the data block according to the storage device ID.
异常恢复模块还可以包括第三处理子模块, 用于在所述恢复状态下, 对 于读数据请求, 对从緩存或从主用存储系统 13 获得的元数据进行缺失块检 测, 若检测到所述元数据有缺失块, 则向备用存储系统 14发送请求信息包括 缺失块信息, 在备用存储系统 14返回元数据后, 构造缺失块恢复操作信息, 以供主存储系统 13进行数据恢复; 并更新主存储系统 13的故障存储设备位 图, 返回所述元数据。  The abnormality recovery module may further include a third processing submodule for performing, in the recovery state, a missing block detection on the metadata obtained from the cache or from the primary storage system 13 for the read data request, if the If the metadata has a missing block, the request information is sent to the backup storage system 14 including the missing block information. After the backup storage system 14 returns the metadata, the missing block recovery operation information is constructed for the primary storage system 13 to perform data recovery; and the primary is updated. The faulty storage device bitmap of the storage system 13 returns the metadata.
具体地, 在恢复状态下, 对于读数据请求, 异常恢复模块通过第三处理 子模块首先检索其元数据緩存, 如果没有查找到元数据, 则向主用存储系统 13请求元数据, 接着对从緩存或主用存储系统 13得到的元数据进行缺失块 检测, 如果发现有缺失块, 再向备用存储系统 14请求元数据, 传递缺失块信 息 (块号, 数据 ID, 偏移) , 在备用存储系统 14返回元数据后, 构造主用 存储系统 13缺失块恢复操作信息, 主用存储系统 13根据此信息或者向备用 存储系统 14或者向外部存储系统 15恢复数据。 恢复后, 更新故障存储设备 bitmap位图, 返回元数据。  Specifically, in the recovery state, for the read data request, the abnormality recovery module first retrieves its metadata cache through the third processing submodule, and if the metadata is not found, requests the primary storage system 13 for metadata, and then the secondary data. The metadata obtained by the cache or the primary storage system 13 performs missing block detection. If a missing block is found, the metadata is requested from the alternate storage system 14, and the missing block information (block number, data ID, offset) is passed in the spare storage. After the system 14 returns the metadata, the primary storage system 13 is constructed to delete the block recovery operation information, and the primary storage system 13 restores the data to the backup storage system 14 or to the external storage system 15 based on this information. After recovery, update the fault memory device bitmap bitmap and return the metadata.
异常恢复模块还可以包括第四处理子模块, 用于在所述恢复状态下, 对 于写数据请求, 向主用存储系统 13请求元数据, 获得所述元数据后, 若检测 到所述元数据有缺失块, 则更新主存储系统 13的故障存储设备位图, 返回所 述元数据。 具体地, 在恢复状态下, 对于写数据请求, 异常恢复模块通过第 四处理子模块向主用存储系统 13请求元数据, 得到元数据后, 以故障存储设 备 ID检测缺失块, 如果有缺失块则直接更新故障存储设备 bitmap位图, 然 后返回请求结果到元数据操作模块。 The abnormality recovery module may further include a fourth processing submodule, configured to, in the recovery state, request metadata for the primary storage system 13 for the write data request, and after the metadata is obtained, if the metadata is detected If there is a missing block, the failed storage device bitmap of the primary storage system 13 is updated, and the metadata is returned. Specifically, in the recovery state, for the write data request, the abnormality recovery module requests the metadata to the primary storage system 13 through the fourth processing submodule, and after obtaining the metadata, the fault storage is configured. The standby ID detects the missing block. If there is a missing block, it directly updates the fault storage device bitmap bitmap, and then returns the request result to the metadata operation module.
在恢复状态下, 异常恢复模块启动一个后台恢复线程。 恢复线程根据恢 复设备的类型, 选择恢复过程。 如果是存储设备没有读写故障 (可能是存储 设备热插拔后又插回来) , 只需要把在异常期间写到备用存储系统的数据保 存到主用存储系统, 并删掉重绑定的元数据緩存。  In the recovery state, the exception recovery module starts a background recovery thread. The recovery thread selects the recovery process based on the type of device being restored. If the storage device has no read/write failure (maybe the storage device is hot swapped and plugged in again), you only need to save the data written to the backup storage system during the abnormal period to the primary storage system, and delete the rebinding element. Data cache.
如果是存储设备读写故障, 遍历主用存储系统的数据, 其恢复过程如下: 异常恢复模块首先检索其元数据緩存, 如果查找到元数据緩存, 就要检查是 否有数据在备用存储系统, 如果有数据在备用存储系统(也就是有重绑定过 的元数据緩存) , 则构造主用存储缺失块恢复操作信息, 主存储系统根据此 信息向备份存储系统恢复数据。 恢复后, 删掉重绑定过的元数据緩存并更新 故障存储设备 bitma 位图。  If the storage device reads and writes the fault and traverses the data of the primary storage system, the recovery process is as follows: The abnormal recovery module first retrieves its metadata cache. If the metadata cache is found, it is necessary to check whether there is data in the standby storage system. If there is data in the alternate storage system (that is, there is a re-bound metadata cache), the primary storage missing block recovery operation information is constructed, and the primary storage system restores the data to the backup storage system based on the information. After recovery, delete the re-bound metadata cache and update the faulty storage device bitma bitmap.
如果在元数据緩存没有查找到元数据, 则向主用存储系统请求元数据, 接着对从主用存储系统得到的元数据进行缺失块检测, 如果发现有缺失块, 再向备用存储系统请求元数据, 请求信息包括缺失块信息 (块号, 数据 ID, 偏移) , 但不包括外部存储地址, 在备用存储系统返回元数据后, 构造主用 存储系统缺失块恢复操作信息, 主存储系统根据此信息或者向备份存储系统 或者向外部存储系统恢复数据。 恢复后, 更新故障存储设备 bitmap位图。  If the metadata is not found in the metadata cache, the metadata is requested from the primary storage system, and then the missing block detection is performed on the metadata obtained from the primary storage system. If a missing block is found, the metadata is requested from the standby storage system. Data, the request information includes missing block information (block number, data ID, offset), but does not include an external storage address, after the backup storage system returns the metadata, constructing the primary storage system missing block recovery operation information, the primary storage system is based on This information either restores data to the backup storage system or to an external storage system. After recovery, update the fault memory device bitmap bitmap.
后台恢复线程在遍历主用存储系统的数据后, 还遍历在异常期间异常恢 复模块记录下写数据操作, 负责把写在备用存储系统的数据保存到主用存储 系统, 然后更新故障存储设备 bitmap位图并删掉元数据緩存。  After the background recovery thread traverses the data of the primary storage system, it also traverses the write data operation recorded by the abnormal recovery module during the abnormal period, and is responsible for saving the data written in the standby storage system to the primary storage system, and then updating the fault storage device bitmap bit. Figure and delete the metadata cache.
如图 2所示, 该分布式文件系统中还可以包括主用数据管理模块, 连接 在主用存储系统 13和元数据管理单元 12之间,用于管理主用存储系统 13所 存储的数据, 响应元数据请求和数据操作请求。 具体地, 主用数据管理模块 负责响应元数据请求, 还负责接收恢复操作信息并实现数据恢复, 恢复操作 信息有两类, 一种有备用存储系统地址信息, 一种有外部存储系统地址信息 的。 主用数据管理模块根据恢复操作信息, 或者向备用存储系统或者外部存 储系统恢复数据块。 As shown in FIG. 2, the distributed file system may further include an active data management module connected between the primary storage system 13 and the metadata management unit 12 for managing data stored by the primary storage system 13. Respond to metadata requests and data manipulation requests. Specifically, the active data management module is responsible for responding to the metadata request, and is also responsible for receiving the recovery operation information and realizing the data recovery. There are two types of recovery operation information, one having an alternate storage system address information, and one having an external storage system address information. of. The active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system.
该分布式文件系统中还可以包括备用数据管理模块, 连接在备用存储系 统 14和元数据管理单 12元之间, 用于管理备用存储系统 14所存储的数据, 响应元数据请求和数据操作请求。 具体地, 备用数据管理模块负责响应元数 据请求, 还负责接收备份请求并实现数据备份。 元数据请求有两类, 第一类 有外部存储地址信息, 第二类没有外部存储系统地址, 备用数据管理模块对 这两类元数据请求的区别是: 如果在备用存储系统 14检索不到元数据, 则处 理第一类时会向外部存储系统 15请求所有数据并存储下来,返回存储后的元 数据。 备用存储系统 14处理备份请求时, 根据备份请求, 直接向主用存储系 统 13请求数据并保存下来。  The distributed file system may further include an alternate data management module connected between the backup storage system 14 and the metadata management unit 12 for managing data stored by the standby storage system 14, responding to metadata requests and data operation requests. . Specifically, the standby data management module is responsible for responding to the metadata request, and is also responsible for receiving the backup request and implementing data backup. There are two types of metadata requests, the first type has external storage address information, and the second type has no external storage system address. The difference between the backup data management module and the two types of metadata requests is: If the secondary storage system 14 does not retrieve the element The data, when processing the first class, requests all data to the external storage system 15 and stores it, returning the stored metadata. When the backup storage system processes the backup request, it requests data directly from the primary storage system 13 and saves it according to the backup request.
上述的分布式文件系统中, 主用存储系统包括数个高速存储设备, 所述 的高速存储设备包括但不限于高数据传输率的 SCSI硬盘、 SATA硬盘、 SSD。 主用存储系统也负责监控自身存储设备状态, 并上报设备事件(如设备故障, 设备恢复) 。 上述设备故障包括存储设备读写故障、 热插拔存储设备。 备用 存储系统包括数个高速存储设备和 /或低速存储设备, 其中, 高速存储设备包 括但不限于高数据传输率的 SCSI硬盘、 SATA硬盘、 SSD; 低速存储设备包 括但不限于低数据传输率的存储设备。  In the above distributed file system, the primary storage system includes a plurality of high-speed storage devices, including but not limited to a high data transfer rate SCSI hard disk, a SATA hard disk, and an SSD. The active storage system is also responsible for monitoring the status of its own storage devices and reporting device events (such as device failures and device recovery). The above device faults include storage device read/write failures and hot-swappable storage devices. The backup storage system includes a plurality of high-speed storage devices and/or low-speed storage devices, wherein the high-speed storage devices include but are not limited to SCSI hard disks with high data transfer rates, SATA hard disks, and SSDs; and low-speed storage devices include, but are not limited to, low data transfer rates. Storage device.
上述实施例中所述的恢复操作信息有两类, 一种有备用存储系统地址信 息, 一种有外部存储系统地址信息的。 主用数据管理模块根据恢复操作信息, 或者向备用存储系统或者外部存储系统恢复数据块。 也就是说, 有备用存储系 统地址信息的恢复操作信息,主用数据管理模块从备用存储系统恢复数据块有 外部存储系统地址信息的恢复操作信息,主用数据管理模块从外部存储系统恢 复数据块在恢复状态下, 读数据时, 就会构造上面的恢复操作信息, 这样, 既 提供读数据服务又马上恢复所访问的数据,根据数据的局部性原理, 最近访问 的数据也就是大多数用户关注的数据, 优先恢复这些数据, 有利于提高性能。 关于外部存储地址: 在异常状态下, 读数据时, 需要向备用存储系统请 求元数据, 这是包括外部存储地址的, 这希望备用存储系统本身没有所请求 的元数据时, 备用存储系统先从外部存储系统请求数据(相当于把所有数据 从外部存储系统备份到备用存储系统) , 再返回元数据。 这样, 既保证读数 据服务又把数据从外部存储系统备份到备用存储系统中, 而恢复时可以从备 用存储系统中恢复数据, 从而加速数据恢复。 在恢复状态下, 读数据时或后 台数据恢复线程时, 可能会向备用存储系统请求元数据, 这是不包括外部存 储地址的, 这样, 如果备用存储系统本身没有所请求的元数据, 备用存储系 统不会向外部存储存储系统请求数据, 备用存储系统返回空。 接着主用存储 系统使用外部存储地址, 向外部存储系统请求数据(只请求缺失块所包含的 数据, 数据恢复时只需要恢复缺失块所包含的数据) 。 There are two types of recovery operation information described in the above embodiments, one having alternate storage system address information and one having external storage system address information. The active data management module restores the data block according to the recovery operation information or to the standby storage system or the external storage system. That is, the recovery operation information of the spare storage system address information, the main data management module recovers the recovery operation information of the external storage system address information from the backup storage system, and the main data management module recovers the data block from the external storage system. In the recovery state, when the data is read, the above recovery operation information is constructed, so that the read data service is provided and the accessed data is immediately restored. According to the locality principle of the data, the recently accessed data is also the focus of most users. The data, priority to restore these data, helps improve performance. About the external storage address: In the abnormal state, when reading data, it needs to request metadata from the standby storage system, which includes the external storage address. When the backup storage system itself does not have the requested metadata, the backup storage system first The external storage system requests data (equivalent to backing up all data from the external storage system to the alternate storage system) and returns the metadata. In this way, both the read data service and the data are backed up from the external storage system to the standby storage system, and the data can be recovered from the standby storage system during recovery, thereby speeding up data recovery. In the recovery state, when reading data or background data recovery threads, metadata may be requested from the alternate storage system, which does not include external storage addresses, so that if the standby storage system itself does not have the requested metadata, the alternate storage The system does not request data from the external storage storage system and the alternate storage system returns null. Then the primary storage system uses the external storage address to request data from the external storage system (only the data contained in the missing block is requested, and only the data contained in the missing block needs to be recovered when the data is recovered).
本发明实施例提供的分布式文件系统, 将数据写入到主用存储系统中, 采用异步备份机制备份到备用存储系统, 不影响高速设备的读写性能; 可以 在线恢复数据, 数据对外服务过程中自动实现数据恢复, 加快恢复过程; 而 且恢复数据时不需要计算, 即使数据没有备份, 仍可以通过外部存储恢复; 在 CDN环境下, 使用策略备份数据机制与可用外部存储获取没有备份的数 据, 可实现部分数据备份而不影响可用性; 另外, 可以用廉价的存储设备组 成备用存储系统, 降低产品成本。  The distributed file system provided by the embodiment of the present invention writes data to the primary storage system and uses the asynchronous backup mechanism to back up to the backup storage system without affecting the read and write performance of the high-speed device. The data can be restored online, and the data is externally serviced. Automatically realize data recovery and speed up the recovery process; and no need to calculate when recovering data, even if the data is not backed up, it can be restored by external storage; In the CDN environment, use the policy backup data mechanism and available external storage to obtain data without backup, Partial data backup can be achieved without affecting availability; in addition, an inexpensive storage device can be used to form an alternate storage system to reduce product cost.
图 3为本发明实施例分布式文件系统在正常状态下处理流程示意图, 如 图 3所示, 该流程包括:  FIG. 3 is a schematic diagram of a process flow of a distributed file system in a normal state according to an embodiment of the present invention. As shown in FIG. 3, the process includes:
步骤 1、 备份管理模块读取数据控制信息;  Step 1. The backup management module reads the data control information.
步骤 2、 备份管理模块按照备份策略对数据控制信息进行读写请求情况 分析, 如果满足策略要求, 组成备份控制信息, 发到备用数据管理模块, 要 求将主用存储系统上的文件备份到备用存储服务器上;  Step 2: The backup management module analyzes the read and write requests of the data control information according to the backup policy. If the policy requirements are met, the backup control information is formed and sent to the standby data management module, and the files on the primary storage system are backed up to the backup storage. On the server;
步骤 3、备用数据管理模块接收备份管理模块发过来的备份控制请求(操 作类型, 目标文件路径, 源文件路径) , 根据备份控制请求信息, 向主用数 据管理模块发出备份请求; Step 3: The standby data management module receives the backup control request (operation type, target file path, source file path) sent by the backup management module, and uses the backup control request information according to the backup control request information. Issue a backup request according to the management module;
步骤 4、 主用数据管理模块接收备份请求, 向主用存储系统中的存储设 备读取数据;  Step 4: The primary data management module receives the backup request, and reads data from the storage device in the primary storage system.
步骤 5、 主用数据管理模块返回数据给备用数据管理模块;  Step 5: The primary data management module returns data to the standby data management module;
步骤 6、 备用数据管理模块把数据写到备用存储系统中的存储设备上; 步骤 7、 返回备份情况给备份管理模块。  Step 6. The standby data management module writes the data to the storage device in the standby storage system. Step 7. Return the backup status to the backup management module.
图 4为本发明实施例分布式文件系统在异常状态下并且为读数据请求情 况下的处理流程示意图, 如图 4所示, 该流程包括:  FIG. 4 is a schematic flowchart of a process in a distributed file system in an abnormal state and in a case of a read data request according to an embodiment of the present invention. As shown in FIG. 4, the process includes:
步骤 1、 应用程序向接入模块发出读数据的请求;  Step 1. The application sends a request for reading data to the access module.
步骤 2、 接入模块向元数据操作模块发出读取元数据请求;  Step 2. The access module issues a read metadata request to the metadata operation module.
步骤 3、 元数据操作模块把读取元数据请求转发给异常恢复模块; 步骤 4、 异常恢复模块先在元数据信息緩存查找元数据, 如果找到转向 第 6步;  Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4. The abnormality recovery module first searches for metadata in the metadata information cache, and if it finds a turn, step 6;
步骤 5、 异常恢复模块向主用数据管理模块发起元数据请求, 主用数据 管理模块接收备份请求, 向主用存储系统中的存储设备读取元数据, 然后返 回元数据给异常恢复模块;  Step 5: The abnormality recovery module initiates a metadata request to the primary data management module, and the primary data management module receives the backup request, reads the metadata to the storage device in the primary storage system, and returns the metadata to the abnormality recovery module.
步骤 6、 异常恢复模块检查从元数据緩存或主用存储系统返回来的元数 据信息是否有缺失块存在, 有则向备用存储系统发起元数据请求并带上一个 外部备份控制信息 (包括外部存储系统位置, 数据位置信息) , 没有则返回 元数据信息到元数据操作模块, 并转向第 9步;  Step 6. The abnormality recovery module checks whether the metadata information returned from the metadata cache or the primary storage system has a missing block, and then initiates a metadata request to the standby storage system and carries an external backup control information (including external storage). System location, data location information), if not, return metadata information to the metadata operation module, and go to step 9;
步骤 7、 如果此数据已经备份在备用存储系统, 则备用数据管理模块返 回元数据, 否则备用数据管理模块根据外部备份控制信息向外部存储系统请 求数据, 把所得数据存储在备用存储系统, 并返回元数据信息;  Step 7. If the data is backed up in the standby storage system, the standby data management module returns the metadata, otherwise the standby data management module requests data from the external storage system according to the external backup control information, stores the obtained data in the standby storage system, and returns Metadata information;
步骤 8、 异常恢复模块进行块地址重绑定操作, 修改缺失块的元数据的 块映射表, 将在备用存储系统上应用相应的块地址替换掉缺失块地址, 并将 生绑定的元数据緩存起来, 返回所有的元数据给元数据操作模块; 步骤 9、 元数据操作模块返回元数据给接入模块; Step 8. The abnormality recovery module performs a block address rebinding operation, modifies the block mapping table of the metadata of the missing block, replaces the missing block address by applying the corresponding block address on the alternate storage system, and generates the bound metadata. Cache up, return all metadata to the metadata operation module; Step 9. The metadata operation module returns the metadata to the access module.
步骤 10、 接入模块根据返回的元数据信息, 向相应的主用存储系统、 备 用存储系统发起 10数据请求;  Step 10: The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.
步骤 11、 主用存储系统、 备用存储系统返回数据给接入模块;  Step 11. The primary storage system and the backup storage system return data to the access module.
步骤 12、 接入模块响应数据返回给应用程序。  Step 12. The access module returns data to the application.
图 5为本发明实施例分布式文件系统在异常状态下并且为写数据请求情 况下的处理流程示意图, 如图 5所示, 该流程包括:  FIG. 5 is a schematic diagram of a process flow of a distributed file system in an abnormal state and in the case of a write data request according to an embodiment of the present invention. As shown in FIG. 5, the process includes:
步骤 1、 应用程序向接入模块发出写数据的请求;  Step 1. The application sends a request for writing data to the access module.
步骤 2、 接入模块向元数据操作模块发出读取元数据请求;  Step 2. The access module issues a read metadata request to the metadata operation module.
步骤 3、 元数据操作模块把读取元数据请求转发给异常恢复模块; 步骤 4、 异常恢复模块直接向备用存储系统发起元数据请求;  Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module directly initiates a metadata request to the standby storage system.
步骤 5、 备用数据管理模块接收元数据请求并构造元数据, 返回元数据; 步骤 6、 异常恢复模块返回元数据信息给元数据操作模块;  Step 5: The standby data management module receives the metadata request and constructs the metadata, and returns the metadata. Step 6. The exception recovery module returns the metadata information to the metadata operation module.
步骤 7、 元数据操作模块返回元数据给接入模块;  Step 7. The metadata operation module returns the metadata to the access module.
步骤 8、 接入模块根据返回的元数据信息, 向备用存储设备发起 10数据 请求;  Step 8. The access module initiates a data request to the backup storage device according to the returned metadata information.
步骤 9、 接入模块写数据到备用存储系统;  Step 9. The access module writes data to the backup storage system.
步骤 10、 接入模块返回写数据结果给应用程序。  Step 10. The access module returns a write data result to the application.
图 6为本发明实施例分布式文件系统在恢复状态下响应读数据请求的处 理流程示意图, 如图 6所示, 该流程包括:  FIG. 6 is a schematic flowchart of a process for a distributed file system to respond to a read data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 6, the process includes:
步骤 1、 应用程序向接入模块发出写数据的请求;  Step 1. The application sends a request for writing data to the access module.
步骤 2、 接入模块向元数据操作模块发出读取元数据请求;  Step 2. The access module issues a read metadata request to the metadata operation module.
步骤 3、 元数据操作模块把读取元数据请求转发给异常恢复模块; 步骤 4、 异常恢复模块检索元数据緩存, 如果没有找到, 则向主用存储 系统请求元数据, 如果找到跳到第 6步;  Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module retrieves the metadata cache, and if not found, requests metadata from the primary storage system, and if found, jumps to the sixth Step
步骤 5、 主用存储系统接收请求元数据, 返回元数据; 步骤 6、 异常恢复模块检查有没有缺失块, 如果有则向备用存储系统请 求元数据, 如果没有, 则跳到第 11步; Step 5: The primary storage system receives the request metadata and returns the metadata. Step 6. The abnormality recovery module checks whether there is a missing block, and if so, requests metadata from the standby storage system. If not, skips to step 11;
步骤 7、 异常恢复模块根据备用存储系统返回的元数据, 构造恢复控制 信息发给主用存储系统;  Step 7. The abnormality recovery module constructs the recovery control information and sends the information to the primary storage system according to the metadata returned by the standby storage system.
步骤 8、 主用存储系统接收到恢复控制信息, 并执行数据恢复; 步骤 9、 返回数据恢复结果到异常恢复模块;  Step 8. The primary storage system receives the recovery control information and performs data recovery. Step 9. Return the data recovery result to the abnormality recovery module.
步骤 10、 异常恢复模块更新已经恢复的缺失块 bitmap位图;  Step 10: The abnormality recovery module updates the missing block bitmap bitmap that has been restored;
步骤 11、 返回从元数据緩存或主用存储系统得到的元数据给元数据操作 模块;  Step 11. Return metadata obtained from the metadata cache or the primary storage system to the metadata operation module.
步骤 12、 元数据操作模块返回元数据给接入模块;  Step 12. The metadata operation module returns the metadata to the access module.
步骤 13、 接入模块根据返回的元数据信息, 向相应的主用存储系统、 备 用存储系统发起 10数据请求;  Step 13: The access module initiates a data request to the corresponding primary storage system and the standby storage system according to the returned metadata information.
步骤 14、 主用存储系统、 备用存储系统返回数据给接入模块; 步骤 15、 接入模块响应数据返回给应用程序。  Step 14. The primary storage system and the standby storage system return data to the access module. Step 15. The access module returns data to the application.
图 7本发明实施例分布式文件系统在恢复状态下响应写数据请求的处理 流程示意图, 如图 7所示, 该流程包括:  FIG. 7 is a schematic diagram of a process flow of a distributed file system responding to a write data request in a recovery state according to an embodiment of the present invention. As shown in FIG. 7, the process includes:
步骤 1、 应用程序向接入模块发出写数据的请求;  Step 1. The application sends a request for writing data to the access module.
步骤 2、 接入模块向元数据操作模块发出读取元数据请求;  Step 2. The access module issues a read metadata request to the metadata operation module.
步骤 3、 元数据操作模块把读取元数据请求转发给异常恢复模块; 步骤 4、 异常恢复模块向主用存储系统请求元数据;  Step 3: The metadata operation module forwards the read metadata request to the abnormality recovery module. Step 4: The abnormality recovery module requests metadata from the primary storage system.
步骤 5、 主用存储系统接收请求元数据, 返回元数据;  Step 5: The primary storage system receives the request metadata and returns the metadata;
步骤 6、 异常恢复模块检查有没有缺失块, 如果有, 则更新已经恢复的 缺失块 bitmap位图, 把缺失块所对应的位图设置为 1 ;  Step 6. The abnormality recovery module checks whether there is a missing block, and if so, updates the bitmap block bitmap of the missing block that has been restored, and sets the bitmap corresponding to the missing block to 1;
步骤 7、 返回从元数据给元数据操作模块。  Step 7. Return the metadata from the metadata to the operation module.
图 8为本发明实施例分布式文件系统在系统恢复状态下数据恢复过程的 处理流程示意图, 如图 8所示, 该流程包括: 步骤 1、 后台数据恢复线程在元数据緩存中查找将要恢复的数据的元数 据, 如果没有找到则跳到第 6步; FIG. 8 is a schematic diagram of a processing flow of a data recovery process of a distributed file system in a system recovery state according to an embodiment of the present invention. As shown in FIG. 8, the process includes: Step 1. The background data recovery thread searches the metadata cache for the metadata of the data to be restored, and if not found, skips to step 6;
步骤 2、 如果找到元数据緩存, 后台数据恢复线程检查是否有数据在备 用存储系统上, 如果没有数据在备用存储系统上, 跳到第 11步;  Step 2. If the metadata cache is found, the background data recovery thread checks whether there is data on the standby storage system. If there is no data on the standby storage system, skip to step 11;
步骤 3、 如果有数据在备用存储系统上, 后台数据恢复线程构造主用存 储缺失块恢复操作信息并发给主用存储系统;  Step 3. If there is data on the standby storage system, the background data recovery thread constructs the primary storage missing block recovery operation information and sends the operation information to the primary storage system;
步骤 4、 主用存储系统接收到缺失块恢复操作信息并执行, 将备用存储 系统数据恢复到缺失块, 返回缺失块恢复操作结果;  Step 4: The primary storage system receives the missing block recovery operation information and executes, restores the backup storage system data to the missing block, and returns the missing block recovery operation result;
步骤 5、 后台数据恢复线程删掉重绑定过的元数据緩存并跳到第 11步; 步骤 6、 如果在元数据緩存没有找到元数据, 则后台数据恢复线程向主 用存储系统请求元数据;  Step 5: The background data recovery thread deletes the re-bound metadata cache and jumps to step 11; Step 6. If the metadata is not found in the metadata cache, the background data recovery thread requests metadata from the primary storage system. ;
步骤 7、对返回的元数据进行缺失块检查,如果没有缺失块,跳到第 11步; 步骤 8、 如果有缺失块, 后台数据恢复线程请求向备用存储系统恢复数据; 步骤 9、 如果第 8步备份成功, 则跳到第 11步;  Step 7. Perform a missing block check on the returned metadata. If there is no missing block, skip to step 11; Step 8. If there is a missing block, the background data recovery thread requests to restore data to the standby storage system; Step 9, if the 8th If the backup is successful, skip to step 11;
步骤 10、 如果第 8步备份失败, 则后台数据恢复线程请求向外部存储系 统恢复数据;  Step 10. If the backup in step 8 fails, the background data recovery thread requests to restore data to the external storage system.
步骤 11、 更新故障存储设备 bitmap位图, 则此数据恢复完成。  Step 11. Update the fault memory device bitmap bitmap, and the data recovery is completed.
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: ROM、 RAM,磁碟或者光盘等各种可以存储程序代码的介质。  A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或 者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技 术方案的本质脱离本发明各实施例技术方案的精神和范围。  It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

权 利 要 求 书 Claim
1、 一种分布式文件系统, 其特征在于, 包括: 接入模块、 与所述接入模 块连接的元数据管理单元, 以及分别与所述元数据管理单元连接的主用存储 系统和备用存储系统, 所述接入模块还分别与所述主用存储系统和所述备用 存储系统连接; 所述接入模块、 所述元数据管理单元、 所述主用存储系统和 所述备用存储系统之间通过系统总线连接; 还包括外部存储系统, 所述外部 存储系统通过网络与所述主用存储系统和所述备用存储系统连接; 其中: 所述接入模块, 用于接收读 /写数据请求, 向所述元数据管理单元发送元 数据请求以获取所请求的数据对应的元数据, 并应用所述元数据向所述主用 存储系统或所述备用存储系统读写数据;  A distributed file system, comprising: an access module, a metadata management unit connected to the access module, and an active storage system and an alternate storage respectively connected to the metadata management unit; The access module is further connected to the primary storage system and the backup storage system respectively; the access module, the metadata management unit, the primary storage system, and the backup storage system Interconnecting through a system bus; further comprising an external storage system, wherein the external storage system is connected to the primary storage system and the backup storage system through a network; wherein: the access module is configured to receive a read/write data request Sending a metadata request to the metadata management unit to acquire metadata corresponding to the requested data, and applying the metadata to read and write data to the primary storage system or the standby storage system;
所述元数据管理单元, 用于在所述接入模块请求所述元数据时, 查找所 请求的数据在所述主用存储系统或所述备用存储系统上的位置, 并构造所述 元数据, 返回给所述接入模块; 所述主用存储系统, 用于在所述分布式文件 系统处于正常状态时为所述接入模块提供所请求的数据;所述备用存储系统, 用于在所述分布式文件系统处于异常状态或恢复状态时, 为所述主用存储系 统提供数据备份; 所述外部存储系统, 用于为所述主用存储系统提供数据备 份。  The metadata management unit is configured to: when the access module requests the metadata, find a location of the requested data on the primary storage system or the backup storage system, and construct the metadata Returning to the access module, the primary storage system, configured to provide the requested data to the access module when the distributed file system is in a normal state; and the standby storage system is configured to Providing a data backup for the primary storage system when the distributed file system is in an abnormal state or a recovery state; and the external storage system is configured to provide data backup for the primary storage system.
2、 根据权利要求 1所述的分布式文件系统, 其特征在于, 所述元数据管 理单元包括: 元数据操作模块、 备份模块和异常恢复模块; 其中:  2. The distributed file system according to claim 1, wherein the metadata management unit comprises: a metadata operation module, a backup module, and an abnormality recovery module; wherein:
所述元数据操作模块, 分别与所述接入模块、 所述主用存储系统连接, 用于接收所述元数据请求, 向所述主用存储系统或所述异常恢复模块请求元 数据并向所述接入模块返回所述元数据; 还用于根据接收的所述主用存储系 统上报的设备状态事件, 更新所述系统状态; 还用于记录所述读 /写数据请求 中的数据控制信息;  The metadata operation module is respectively connected to the access module and the primary storage system, and configured to receive the metadata request, request metadata from the primary storage system or the abnormal recovery module, and The access module returns the metadata; and is further configured to update the system state according to the received device status event reported by the primary storage system; and further configured to record data control in the read/write data request Information
所述备份模块, 与所述元数据操作模块和所述备用存储系统连接, 用于 读取所述元数据操作模块记录下来的所述数据控制信息, 生成数据备份操作 控制信息并发送给所述备用存储系统, 以将所述主用存储系统上的数据备份 到所述备用存储系统中; The backup module is connected to the metadata operation module and the backup storage system, and is configured to read the data control information recorded by the metadata operation module, and generate a data backup operation. Controlling information and transmitting to the alternate storage system to back up data on the primary storage system to the standby storage system;
所述异常恢复模块, 分别与所述元数据操作模块、 所述备份模块、 所述 主用存储系统和所述备用存储系统连接, 用于在所述异常状态和所述恢复状 态下获取所述元数据并返回给所述元数据操作模块。  The abnormality recovery module is respectively connected to the metadata operation module, the backup module, the primary storage system, and the backup storage system, and configured to acquire the abnormal state and the restored state. Metadata is returned to the metadata manipulation module.
3、 根据权利要求 2所述的分布式文件系统, 其特征在于, 所述元数据管 理单元还包括:  The distributed file system according to claim 2, wherein the metadata management unit further comprises:
数据控制信息记录模块, 连接在所述元数据操作模块和所述备份模块之 间, 用于存储所述数据控制信息。  A data control information recording module is connected between the metadata operation module and the backup module for storing the data control information.
4、 根据权利要求 2所述的分布式文件系统, 其特征在于, 所述异常恢复 模块包括:  The distributed file system according to claim 2, wherein the abnormality recovery module comprises:
第一处理子模块, 用于在所述异常状态下, 对于读数据请求, 对从緩存 或从所述主用存储系统获得的元数据进行缺失块检测, 若检测到所述元数据 有缺失块, 则向所述备用存储系统发送请求信息包括缺失块信息和所述外部 存储系统的地址; 在所述备用存储系统返回所述元数据后, 进行块地址重绑 定操作, 重组可用的元数据并发送;  a first processing submodule, configured to perform, for a read data request, a missing block detection on metadata obtained from a cache or from the primary storage system in the abnormal state, if the metadata is detected to have a missing block Transmitting, to the alternate storage system, the request information includes missing block information and an address of the external storage system; after the backup storage system returns the metadata, performing a block address rebinding operation to reorganize available metadata And send
第二处理子模块, 用于在所述异常状态下, 对于写数据请求, 仅向所述 备用存储系统请求元数据并返回给所述元数据操作模块; 还用于记录写数据 操作, 以在数据恢复过程中根据记录的所述写数据操作将数据同步到所述主 用存储系统中;  a second processing submodule, configured to, in the abnormal state, request metadata for the backup storage system and return to the metadata operation module for the write data request; and also for recording the write data operation, to Synchronizing data into the primary storage system according to the recorded write data operation during data recovery;
第三处理子模块, 用于在所述恢复状态下, 对于读数据请求, 对从緩存 或从所述主用存储系统获得的元数据进行缺失块检测, 若检测到所述元数据 有缺失块, 则向所述备用存储系统发送请求信息包括缺失块信息, 在所述备 用存储系统返回元数据后, 构造缺失块恢复操作信息, 以供所述主存储系统 进行数据恢复; 并更新所述主存储系统的故障存储设备位图, 返回所述元数 据; 第四处理子模块, 用于在所述恢复状态下, 对于写数据请求, 向所述主 用存储系统请求元数据, 获得所述元数据后, 若检测到所述元数据有缺失块, 则更新所述主存储系统的故障存储设备位图, 返回所述元数据。 a third processing submodule, configured to perform, for the read data request, a missing block detection on the metadata obtained from the cache or from the primary storage system in the recovery state, if the metadata is detected to have a missing block Transmitting the request information to the backup storage system, including the missing block information, after the backup storage system returns the metadata, constructing the missing block recovery operation information for the primary storage system to perform data recovery; and updating the primary A faulty storage device bitmap of the storage system, returning the metadata; a fourth processing submodule, configured to, in the recovery state, request metadata for the write data request to the primary storage system, and after obtaining the metadata, if it is detected that the metadata has a missing block, Updating the failed storage device bitmap of the primary storage system to return the metadata.
5、 根据权利要求 1或 2或 3或 4所述的分布式文件系统, 其特征在于, 还包括:  The distributed file system according to claim 1 or 2 or 3 or 4, further comprising:
主用数据管理模块, 连接在所述主用存储系统和所述元数据管理单元之 间, 用于管理所述主用存储系统所存储的数据, 响应元数据请求和数据操作 请求; 以及  a primary data management module, coupled between the primary storage system and the metadata management unit, for managing data stored by the primary storage system, in response to a metadata request and a data operation request;
备用数据管理模块, 连接在所述备用存储系统和所述元数据管理单元之 间, 用于管理所述备用存储系统所存储的数据, 响应元数据请求和数据操作 请求。  An alternate data management module is coupled between the alternate storage system and the metadata management unit for managing data stored by the alternate storage system, in response to a metadata request and a data operation request.
6、 根据权利要求 5所述的分布式文件系统, 其特征在于, 所述主用存储 系统包括数个高速存储设备, 所述高速存储设备包括但不限于高数据传输率 的 SCSI硬盘、 SATA硬盘、 SSD;  The distributed file system according to claim 5, wherein the primary storage system comprises a plurality of high-speed storage devices, and the high-speed storage device includes but is not limited to a SCSI hard disk with a high data transfer rate and a SATA hard disk. , SSD;
所述备用存储系统包括数个高速存储设备和 /或低速存储设备, 其中, 所 述高速存储设备包括 SCSI硬盘、 SATA硬盘、 SSD。  The backup storage system includes a plurality of high speed storage devices and/or low speed storage devices, wherein the high speed storage devices include a SCSI hard disk, a SATA hard disk, and an SSD.
7、 根据权利要求 1所述的分布式文件系统, 其特征在于,  7. The distributed file system of claim 1 wherein:
所述正常状态指主用存储系统没有出现故障;  The normal state refers to that the primary storage system has not failed;
所述异常状态指主用存储系统出现故障, 主用存储系统和备用存储系统 共同工作, 协调完成数据存储, 保存协调结果, 其中, 所述协调结果用于数 据恢复过程;  The abnormal state refers to a failure of the primary storage system, the primary storage system and the backup storage system work together, coordinate data storage, and save the coordination result, wherein the coordination result is used for the data recovery process;
所述恢复状态是指经过异常状态后, 使用备用存储系统上的数据恢复主 用存储系统中的数据。  The recovery state refers to restoring data in the primary storage system using data on the standby storage system after an abnormal state.
PCT/CN2011/079685 2010-12-08 2011-09-15 Distributed file system WO2012075845A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2010105872357A CN102024044B (en) 2010-12-08 2010-12-08 Distributed file system
CN201010587235.7 2010-12-08

Publications (1)

Publication Number Publication Date
WO2012075845A1 true WO2012075845A1 (en) 2012-06-14

Family

ID=43865341

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079685 WO2012075845A1 (en) 2010-12-08 2011-09-15 Distributed file system

Country Status (2)

Country Link
CN (1) CN102024044B (en)
WO (1) WO2012075845A1 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024044B (en) * 2010-12-08 2012-11-21 华为技术有限公司 Distributed file system
CN103095767B (en) * 2011-11-03 2019-04-23 中兴通讯股份有限公司 Distributed cache system and data reconstruction method based on distributed cache system
CN103220162B (en) * 2012-01-19 2016-08-31 百度在线网络技术(北京)有限公司 The fault-tolerant optimization method and device of SCSI based on HDFS
GB2503016B (en) * 2012-06-14 2017-10-04 Draeger Safety Uk Ltd A telemetry monitoring system and a data recovery method for a telemetry monitoring system
CN103516736A (en) * 2012-06-20 2014-01-15 中兴通讯股份有限公司 Data recovery method of distributed cache system and a data recovery device of distributed cache system
CN102867035B (en) * 2012-08-28 2015-09-23 浪潮(北京)电子信息产业有限公司 A kind of distributed file system cluster high availability method and device
CN102890716B (en) * 2012-09-29 2017-08-08 南京中兴新软件有限责任公司 The data back up method of distributed file system and distributed file system
CN103049390B (en) * 2012-12-14 2016-03-09 华为技术有限公司 The data processing method of apply metadata and storage system
CN103902349B (en) * 2012-12-27 2017-05-31 中国移动通信集团江西有限公司 A kind of virtual platform storage managing server and its management method
CN103167026B (en) * 2013-02-06 2016-05-18 数码辰星科技发展(北京)有限公司 A kind of cloud store environmental data processing method, system and equipment
CN103207894A (en) * 2013-03-14 2013-07-17 深圳市知正科技有限公司 Multipath real-time video data storage system and cache control method thereof
CN103347086B (en) * 2013-07-11 2016-06-29 南京大学 Collaborative kernel construction method based on Distributed Coordination algorithm
WO2015015502A1 (en) * 2013-07-29 2015-02-05 Hewlett-Packard Development Company, L.P. Writing to files and file meta-data
CN103473184B (en) * 2013-08-01 2016-08-10 记忆科技(深圳)有限公司 The caching method of file system and system
CN104657392B (en) * 2013-11-25 2020-02-11 腾讯科技(深圳)有限公司 Method and device for realizing retrieval abnormity restoration
CN104135539B (en) 2014-08-15 2018-03-16 华为技术有限公司 Date storage method, SDN controllers and distributed network storage system
CN104202387B (en) * 2014-08-27 2017-11-24 华为技术有限公司 A kind of metadata restoration methods and relevant apparatus
CN105915600A (en) * 2016-04-13 2016-08-31 乐视控股(北京)有限公司 Data writing-in method based CDN network system and CDN network system thereof
CN108108422A (en) * 2017-12-15 2018-06-01 郑州云海信息技术有限公司 A kind of metadata acquisition methods, device and the medium of Ceph file system
CN110096220B (en) * 2018-01-31 2020-06-26 华为技术有限公司 Distributed storage system, data processing method and storage node
CN108388604B (en) * 2018-02-06 2022-06-10 平安科技(深圳)有限公司 User authority data management apparatus, method and computer readable storage medium
CN109327539A (en) * 2018-11-15 2019-02-12 上海天玑数据技术有限公司 A kind of distributed block storage system and its data routing method
CN110659157A (en) * 2019-08-30 2020-01-07 安徽芃睿科技有限公司 Distributed multi-language retrieval platform and method for lossless recovery
CN111026432A (en) * 2019-12-06 2020-04-17 中国建设银行股份有限公司 Big data processing platform, platform construction method and storage medium
CN112532525B (en) * 2020-11-25 2022-11-25 北京金山云网络技术有限公司 Processing method, device and system for equipment recovery service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1418422A (en) * 2000-02-04 2003-05-14 里逊·Com股份有限公司 System for disributed media network and meta data server
CN101539873A (en) * 2009-04-15 2009-09-23 成都市华为赛门铁克科技有限公司 Data recovery method, data node and distributed file system
CN102024044A (en) * 2010-12-08 2011-04-20 华为技术有限公司 Distributed file system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6728849B2 (en) * 2001-12-14 2004-04-27 Hitachi, Ltd. Remote storage system and method
WO2004047078A2 (en) * 2002-11-20 2004-06-03 Filesx Ltd. Fast backup storage and fast recovery of data (fbsrd)
CN1955939A (en) * 2006-10-13 2007-05-02 清华大学 Backup and recovery method based on virtual flash disk
CN101394424B (en) * 2008-10-27 2011-11-09 中国科学院计算技术研究所 Hard disc stage network data backup system and method
CN101408855B (en) * 2008-11-07 2010-06-02 北京威视数据系统有限公司 Method for protecting remote backup equipment of temporary abnormity by continuous data protective system
CN101436151B (en) * 2008-12-01 2012-01-11 成都索贝数码科技股份有限公司 Data real time backup method and system based on file system
CN101436149B (en) * 2008-12-19 2010-06-30 华中科技大学 Method for rebuilding data of magnetic disk array

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1418422A (en) * 2000-02-04 2003-05-14 里逊·Com股份有限公司 System for disributed media network and meta data server
CN101539873A (en) * 2009-04-15 2009-09-23 成都市华为赛门铁克科技有限公司 Data recovery method, data node and distributed file system
CN102024044A (en) * 2010-12-08 2011-04-20 华为技术有限公司 Distributed file system

Also Published As

Publication number Publication date
CN102024044A (en) 2011-04-20
CN102024044B (en) 2012-11-21

Similar Documents

Publication Publication Date Title
WO2012075845A1 (en) Distributed file system
US11567674B2 (en) Low overhead resynchronization snapshot creation and utilization
US11522808B2 (en) Shared storage model for high availability within cloud environments
US9734027B2 (en) Synchronous mirroring in non-volatile memory systems
US7313721B2 (en) Apparatus and method for performing a preemptive reconstruct of a fault-tolerant RAID array
TWI450087B (en) Data storage method for a plurality of raid systems and data storage system thereof
US8464094B2 (en) Disk array system and control method thereof
US20040128587A1 (en) Distributed storage system capable of restoring data in case of a storage failure
CN103942112A (en) Magnetic disk fault-tolerance method, device and system
US11748208B2 (en) Persistent memory architecture
CN102110154B (en) File redundancy storage method in cluster file system
WO2021086436A1 (en) Erasure coded data shards containing multiple data objects
JP2006227964A (en) Storage system, processing method and program
WO2017097233A1 (en) Fault tolerance method for data storage load and iptv system
US20230139582A1 (en) Forwarding operations to bypass persistent memory
JPH09269871A (en) Data re-redundancy making system in disk array device
US8117493B1 (en) Fast recovery in data mirroring techniques
US10915405B2 (en) Methods for handling storage element failures to reduce storage device failure rates and devices thereof
US7529776B2 (en) Multiple copy track stage recovery in a data storage system
US11714782B2 (en) Coordinating snapshot operations across multiple file systems
US8276017B2 (en) Process, apparatus, and program for system management
CN102158538B (en) Management method and device of network storage system
US11055190B1 (en) System and method for facilitating storage system operation with global mapping to provide maintenance without a service interrupt
US20230205650A1 (en) Storage system and storage system control method
US20220292004A1 (en) Mediator assisted switchover between clusters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11847286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11847286

Country of ref document: EP

Kind code of ref document: A1