CN116841459A - Single-copy high-availability method of distributed file system - Google Patents

Single-copy high-availability method of distributed file system Download PDF

Info

Publication number
CN116841459A
CN116841459A CN202310537898.5A CN202310537898A CN116841459A CN 116841459 A CN116841459 A CN 116841459A CN 202310537898 A CN202310537898 A CN 202310537898A CN 116841459 A CN116841459 A CN 116841459A
Authority
CN
China
Prior art keywords
node
data
disk
file1
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310537898.5A
Other languages
Chinese (zh)
Inventor
杨兴博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yanrong Technology Co ltd
Original Assignee
Beijing Yanrong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yanrong Technology Co ltd filed Critical Beijing Yanrong Technology Co ltd
Priority to CN202310537898.5A priority Critical patent/CN116841459A/en
Publication of CN116841459A publication Critical patent/CN116841459A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a single-copy high availability method of a distributed file system, which specifically comprises the following steps: s1, a user creates 10 files file1, file2 and file10 at a client; s2, the client sends file1-file10 data to a storage server; s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks; the invention relates to the technical field of file systems. The single-copy high-availability method of the distributed file system reduces storage overhead and can better adapt to the cloud primary scene. At present, most cloud storage products have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage, prevent extra redundancy, reduce performance improvement caused by copy writing, separate service and data, and prevent service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.

Description

Single-copy high-availability method of distributed file system
Technical Field
The invention relates to the technical field of file systems, in particular to a single-copy high-availability method of a distributed file system.
Background
Distributed file systems are an important component of the development of storage systems in recent years. And high availability of services is particularly important for storage services. The traditional single copy mode does not guarantee high availability. When the service is down, storage may be interrupted until the service resumes. Currently, most of the storage systems on the market adopt a data redundancy mode such as multiple copies, EC and the like to ensure service availability.
However, at present, cloud storage is gradually rising, and data redundancy of a file system and data redundancy of the cloud storage can be overlapped with each other, so that storage cost is increased sharply.
An IP-SAN storage-based approach is provided herein to ensure service availability of a file system in a single copy scenario.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a single-copy high-availability method of a distributed file system, and solves the problem that the effect of the traditional storage scheme is not very good.
In order to achieve the above purpose, the invention is realized by the following technical scheme: a single-copy high availability method of a distributed file system specifically comprises the following steps:
s1, a user creates 10 files file1, file2 and file10 at a client;
s2, the client sends file1-file10 data to a storage server;
s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks;
s4, when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 2iscsi mounts the disks;
s5, when the user reads data at the client, for example, file1, the user sends a reading request to the storage server;
s6, when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
and S7, when the node 1 fails, the storage server reads file1 file content from the disk 1 through the node 2 and returns the file1 file content to the client.
Preferably, in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
Preferably, in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
Advantageous effects
The invention provides a single-copy high availability method of a distributed file system. Compared with the prior art, the method has the following beneficial effects:
(1) The single-copy high-availability method of the distributed file system reduces the storage overhead and can better adapt to the cloud primary scene. Most of the cloud storage products at present have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage to prevent extra redundancy.
(2) The single-copy high-availability method of the distributed file system reduces performance improvement caused by copy writing, service and data separation, and prevents service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.
Drawings
FIG. 1is a diagram of a single copy mode architecture under a conventional approach of the present invention;
FIG. 2is a diagram of a multi-copy mode architecture according to a conventional scheme of the present invention;
fig. 3 is a schematic diagram of an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-3, the present invention provides a technical solution: a single copy high availability approach to distributed file systems, conventional storage schemes typically have single copy and multiple copy modes. In the single copy mode, the user data has only one copy of data at the storage server. In the multiple copy mode, the user data will retain multiple copies. Taking the actual scenario as an example, if you have 10 blocks of 100G hard disk for storing user data, single copy and multiple copy modes are as follows:
the architecture diagram of the single copy mode is shown in fig. 1, and in the single copy mode, the user data access flow is as follows:
1: a user creates 10 files file1, file2, file10 at a client;
2: the client sends the file1-file10 data to a storage server;
3: the storage server uniformly stores the data files on each disk, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10;
4: when a user reads data (such as file 1) from a client, sending a read request to a storage server;
5: the storage server reads the file content from the disk 1 and returns it to the client.
In the single copy mode, 10 blocks of 100G data discs can be used for storing user data, and the space utilization rate reaches 100%. However, this solution has serious drawbacks: high availability is not supported. Upon a failure (e.g., a network outage or power loss) of node 1 or node 2, the files on disks 1-5 or 6-10 may be rendered inaccessible.
The architecture diagram of the multi-copy mode is shown in FIG. 2 (two copies are taken as an example)
In fig. 2, node 1is a master node and node 2is a standby node. The disks 1-5 on the node 1 and the disks 6-10 on the node 2 are mirror images of each other, respectively, that is, when data is written into the disks 1-5, the same data is written into the disks 6-10. In the double-copy mode, the user data access flow is as follows:
1: a user creates 5 files file1, file2, file5 at a client;
2: the client sends file1-file15 data to a storage server;
3: the storage server uniformly stores the data files on each disk of the node 1, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file5 is stored on disk 5. The data of file1-5 is then synchronously copied to disk 6-10 of node 2.
4: when a user reads data at a client, such as file1, a read request is sent to a storage server;
5: when the node 1is normal, the storage server reads the file content of the file1 from the disk 1 of the node 1, and returns the file content to the client.
6: when the node 1 fails, the storage server reads the file content of the file1 from the disk 6 of the node 2, and returns the file content to the client.
This solution solves the high availability problem of single copy mode, but there is also another problem: the space utilization is low. In the double-copy mode, only half of the space can be used for storing user data, and the other half of the space is needed for backup, so that the space utilization rate is 50%. The more the copies are, the more serious the space waste is, and in order to solve the problem that the single copy does not support high availability, the multiple copies have the space waste. We propose this patent, a way to achieve high availability in single copy mode based on IP-SAN storage.
As shown in fig. 3, in this embodiment, node 1is a master node and node 2is a standby node. Both nodes mount the same disk group through iscsi protocol. The same disk is accessible to both slave node 1 and slave node 2. The specific user data access flow is as follows:
1: a user creates 10 files file1, file2, file10 at a client;
2: the client sends the file1-file10 data to a storage server;
3: when the node 1is in normal service, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 1iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;
4: when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 2iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;
5: when a user reads data at a client, such as file1, a read request is sent to a storage server;
6: when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
7: when the node 1 fails, the storage server reads the file1 file content from the disk 1 through the node 2, and returns the file1 file content to the client.
The main characteristics of the scheme are as follows:
1: decoupling service availability from data availability
In a new approach, we use IP-SAN to provide storage instead of local disks, separating data and services. After separation, the availability of the service will no longer affect the availability of the data. The service for providing the data supports dynamic switching, and the data availability is ensured not to be influenced by the service availability any more.
2: data access uniqueness
When a plurality of services mount the same data disk, the services may be subjected to different reasons such as service states, network states and the like, and the risks of simultaneous mounting and dirty data reading and writing occur. In our scheme, when the storage service mounts data, the Multiple Mount Protection (MMP) characteristic is used, so that the data can be read and written by one service at the same time, and the problem of conflict does not exist.
3: dynamic handoff of services
In addition to the storage service, there is a management service responsible for switching data services. When the main service of the data service is in fault, downtime, network abnormality or other abnormal conditions cannot be accessed, the management service marks the abnormal state of the main service and performs master-slave switching. And after the standby service is switched to the main service, the standby service is re-installed and stored, and the service is normally provided for the outside. And after the main service is recovered to be normal, the management service can re-mark the state of the main service and change the state into standby waiting of the standby service.
4: without data redundancy
In this scheme, since the data availability is no longer affected by the service availability. Therefore, the double-copy and erasure code data redundancy modes are not needed, the usability is provided, and a large amount of disk cost is saved. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (3)

1. A single copy high availability method for a distributed file system, characterized by: the method specifically comprises the following steps:
s1, a user creates 10 files file1, file2 and file10 at a client;
s2, the client sends file1-file10 data to a storage server;
s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks;
s4, when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 2iscsi mounts the disks;
s5, when the user reads data at the client, for example, file1, the user sends a reading request to the storage server;
s6, when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
and S7, when the node 1 fails, the storage server reads file1 file content from the disk 1 through the node 2 and returns the file1 file content to the client.
2. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
3. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
CN202310537898.5A 2023-05-15 2023-05-15 Single-copy high-availability method of distributed file system Pending CN116841459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310537898.5A CN116841459A (en) 2023-05-15 2023-05-15 Single-copy high-availability method of distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310537898.5A CN116841459A (en) 2023-05-15 2023-05-15 Single-copy high-availability method of distributed file system

Publications (1)

Publication Number Publication Date
CN116841459A true CN116841459A (en) 2023-10-03

Family

ID=88169444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310537898.5A Pending CN116841459A (en) 2023-05-15 2023-05-15 Single-copy high-availability method of distributed file system

Country Status (1)

Country Link
CN (1) CN116841459A (en)

Similar Documents

Publication Publication Date Title
US9501542B1 (en) Methods and apparatus for volume synchronization
US6912669B2 (en) Method and apparatus for maintaining cache coherency in a storage system
US7296126B2 (en) Storage system and data processing system
US7406575B2 (en) Method and system for storing data
US9678686B2 (en) Managing sequentiality of tracks for asynchronous PPRC tracks on secondary
US7861049B2 (en) Methods and apparatus for archiving digital data
CN106407040A (en) Remote data copy method and system
EP3537687B1 (en) Access method for distributed storage system, related device and related system
CN104935654A (en) Caching method, write point client and read client in server cluster system
CN101808127B (en) Data backup method, system and server
CN103777897A (en) Method and system for copying data between primary and secondary storage locations
WO2012075845A1 (en) Distributed file system
JP2008108145A (en) Computer system, and management method of data using the same
US10628298B1 (en) Resumable garbage collection
JP2006227964A (en) Storage system, processing method and program
CN101552799A (en) Media node fault-tolerance method and device
US9513996B2 (en) Information processing apparatus, computer-readable recording medium having stored program for controlling information processing apparatus, and method for controlling information processing apparatus
US7080197B2 (en) System and method of cache management for storage controllers
CN101329691A (en) Redundant magnetic disk array sharing file system and read-write method
US7680839B1 (en) System and method for resynchronizing mirrored volumes
CN114077517A (en) Data processing method, equipment and system
WO2020135889A1 (en) Method for dynamic loading of disk and cloud storage system
CN109218386B (en) High-availability method for managing Hadoop namespace
CN116841459A (en) Single-copy high-availability method of distributed file system
CN114089923A (en) Double-live storage system and data processing method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination